Oracle® Practitioner Guide Building Infrastructure and Platform Cloud Services Release 3.0 E39816-01

April 2013

Building Infrastructure and Platform Cloud Services, Release 3.0 E39816-01

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

Primary Author: Anbu Krishnaswamy Anbarasu Contributing Author: Dr. James Baty, Stephen Bennett, Scott Mattoon, Mark Wilkins

Contributor: Cliff Booth, Dave Chappalle, Bob Hensle, Rob Reakes, Graham Mcmillan

This and related documentation are provided under a license agreement containing restrictions on use and disclosure and are protected by intellectual property laws. Except as expressly permitted in your license agreement or allowed by law, you may not use, copy, reproduce, translate, broadcast, modify, license, transmit, distribute, exhibit, perform, publish, or display any part, in any form, or by any means. Reverse engineering, disassembly, or decompilation of this software, unless required by law for interoperability, is prohibited. The information contained herein is subject to change without notice and is not warranted to be error-free. If you find any errors, please report them to us in writing.

If this is software or related documentation that is delivered to the U.S. Government or anyone licensing it on behalf of the U.S. Government, the following notice is applicable:

U.S. GOVERNMENT RIGHTS Programs, software, databases, and related documentation and technical data delivered to U.S. Government customers are "commercial computer software" or "commercial technical data" pursuant to the applicable Federal Acquisition Regulation and agency-specific supplemental regulations. As such, the use, duplication, disclosure, modification, and adaptation shall be subject to the restrictions and license terms set forth in the applicable Government contract, and, to the extent applicable by the terms of the Government contract, the additional rights set forth in FAR 52.227-19, Commercial Computer Software License (December 2007). Oracle America, Inc., 500 Oracle Parkway, Redwood City, CA 94065.

This software or hardware is developed for general use in a variety of information management applications. It is not developed or intended for use in any inherently dangerous applications, including applications that may create a risk of personal injury. If you use this software or hardware in dangerous applications, then you shall be responsible to take all appropriate fail-safe, backup, redundancy, and other measures to ensure its safe use. Oracle Corporation and its affiliates disclaim any liability for any damages caused by use of this software or hardware in dangerous applications. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners.

Intel and Intel Xeon are trademarks or registered trademarks of Intel Corporation. All SPARC trademarks are used under license and are trademarks or registered trademarks of SPARC International, Inc. AMD, Opteron, the AMD logo, and the AMD Opteron logo are trademarks or registered trademarks of Advanced Micro Devices. UNIX is a registered trademark of The Open Group.

This software or hardware and documentation may provide access to or information on content, products, and services from third parties. Oracle Corporation and its affiliates are not responsible for and expressly disclaim all warranties of any kind with respect to third-party content, products, and services. Oracle Corporation and its affiliates will not be responsible for any loss, costs, or damages incurred due to your access to or use of third-party content, products, or services. Contents

Send Us Your Comments ...... vii

Preface ...... ix Document Purpose...... ix Audience...... x Document Structure...... x How to Use This Document...... x Related Documents ...... x Document Map ...... xi Other References...... xii Conventions ...... xiii

1 Introduction 1.1 Program level and Project Level Scopes...... 1-2 1.2 Cloud Service Development Phases for Platform and Infrastructure Services ...... 1-3 1.3 Relevant Topics ...... 1-4

2 Inception 2.1 Inception Phase Activities ...... 2-1 2.2 Requirement Analysis ...... 2-2 2.2.1 Classification of Cloud service requirements...... 2-3 2.3 Cloud Service Identification...... 2-4 2.3.1 Basic steps in Cloud service identification...... 2-5 2.3.2 Detailed activities in the Cloud service identification process ...... 2-6 2.3.2.1 Platforms...... 2-7 2.3.2.2 Database...... 2-7 2.3.2.3 Infrastructure ...... 2-7 2.3.2.4 Extension services...... 2-8 2.3.2.5 Capacity Planning ...... 2-8 2.3.2.6 Development Cloud...... 2-8 2.3.2.7 Cloud candidate services stack...... 2-8 2.3.2.8 Defining the service boundaries...... 2-10 2.3.2.9 Determining the deployment model ...... 2-10 2.3.2.10 Service justification...... 2-10 2.3.2.11 Workload validation ...... 2-11

iii 2.4 Cloud Service Portfolio Management and Release Planning...... 2-11 2.5 Putting it together ...... 2-13 2.6 An Example...... 2-13 2.6.1 Problem ...... 2-13 2.6.2 Solution ...... 2-13

3 Elaboration 3.1 Cloud Service Definition...... 3-1 3.1.1 Defining Cloud Service Contracts...... 3-2 3.1.2 Defining Service APIs ...... 3-2 3.1.2.1 Characteristics of good Cloud APIs...... 3-3 3.1.2.2 IaaS API...... 3-3 3.1.2.3 PaaS API...... 3-3 3.1.3 Defining service specifications...... 3-5 3.1.3.1 Template for Cloud service definition...... 3-5 3.1.3.2 Defining Service metrics...... 3-6 3.2 Designing Cloud services ...... 3-7 3.2.1 Design Choices...... 3-8 3.2.2 Service Design Template ...... 3-9 3.2.3 Service Assembly Template ...... 3-9 3.3 Putting it together ...... 3-10

4 Construction 4.1 Cloud Service Implementation ...... 4-1 4.2 Packaging and Assembly...... 4-2 4.2.1 Defining Deployable Entities ...... 4-3 4.3 Cloud Service Testing...... 4-4 4.4 Putting it together ...... 4-6

5 Transition 5.1 User Acceptance Testing...... 5-1 5.2 Cloud Service Deployment...... 5-3 5.3 Putting it together ...... 5-4

6 Operate 6.1 Operations Best Practices...... 6-2

7 Summary

iv v List of Figures 1–1 Cloud Service Development - Program and Project Scopes...... 1-2 1–2 Cloud Service Development Process ...... 1-3 2–1 Inception Phase Activities...... 2-1 2–2 Requirements Analysis ...... 2-2 2–3 Cloud Service Development Influencing Factors...... 2-4 2–4 Cloud Service Identification Steps...... 2-5 2–5 Cloud Service Identification Process...... 2-6 2–6 Cloud Candidate Services Stack Model...... 2-9 2–7 Cloud Candidate Services Stack Example...... 2-9 2–8 Cloud Service Portfolio Management and Release Planning...... 2-12 2–9 Inception Phase - putting it together...... 2-13 3–1 Elaboration Phase Activities...... 3-1 3–2 Cloud Service Definition...... 3-2 3–3 PaaS API ...... 3-4 3–4 Cloud Service Design ...... 3-8 3–5 Elaboration Phase...... 3-10 4–1 Construction Phase Activities ...... 4-1 4–2 Cloud Service Implementation ...... 4-2 4–3 Packaging and Assembly...... 4-3 4–4 Cloud Service Testing...... 4-5 4–5 Construction Phase - Putting it together ...... 4-6 5–1 Transition Phase Activities ...... 5-1 5–2 User Acceptance Testing...... 5-2 5–3 Cloud Service Deployment...... 5-3 5–4 Transition Phase - Putting it together ...... 5-4 6–1 Operate - OA&M Phase Activities...... 6-1

vi Send Us Your Comments

Building Infrastructure and Platform Cloud Services, Release 3.0 E39816-01

Oracle welcomes your comments and suggestions on the quality and usefulness of this publication. Your input is an important part of the information used for revision.

■ Did you find any errors?

■ Is the information clearly presented?

■ Do you need more information? If so, where?

■ Are the examples correct? Do you need more examples?

■ What features did you like most about this document?

If you find any errors or have any other suggestions for improvement, please indicate the title and part number of the documentation and the chapter, section, and page number (if available). You can send comments to us at: [email protected].

vii viii Preface

Oracle Reference Architecture (ORA) is a product-agnostic reference architecture based on architecture principles and best practices that are widely applicable and that can be implemented using a wide variety of products and technologies. ORA does not include any implementation artifacts for the prescribed architecture. Rather, ORA addresses the building of a modern, consistent IT architecture while minimizing the risk of product incompatibilities and obsolescence. ORA is an extensible reference architecture that describes many facets of IT. It is comprised of several documents that cover core concepts of technology, along with other documents that build upon these core concepts to describe more complex technology strategies. The ORA Cloud documents present the ORA concepts from the perspective of Cloud, highlighting the specific details of Cloud as an elaboration of the ORA core concepts with respect to this technological approach. This document is part of a series of documents that describe IT Strategies from Oracle (ITSO) Cloud strategy. Please consult the ITSO web site for documents pertaining to Cloud and other technologies.

Document Purpose This document describes a methodology to build PaaS and IaaS Cloud services. Building PaaS and IaaS services differs from building SaaS services and hence is addressed in a separate document.

ix The figure above illustrates where this document is placed in the ETS/Topic Area grid.

Audience This document is intended for enterprise architects, application architects, project managers and developers. The material is designed for a technical audience that is interested in learning about developing IaaS and PaaS Cloud services and understanding how it differs from traditional application development methodologies.

Document Structure This document is organized into chapters based on the lifecycle phases of the Cloud service development methodology. The chapters are organized as follows:

■ Introduction chapter provides an overview of Cloud service development methodology.

■ Chapter 1 describes the inception phase of Cloud service development.

■ Chapter 2 describes the elaboration phase of Cloud service development.

■ Chapter 3 describes the construction phase of Cloud service development.

■ Chapter 4 describes the transition phase of Cloud service development.

■ Chapter 5 provides an introduction to the operational activities involved in Cloud development.

■ Summary provides the conclusion for this document.

How to Use This Document This document is designed to be read from beginning to end. After the initial read, each section could be read independently for quick reference.

Related Documents IT Strategies from Oracle (ITSO) is a series of documentation and supporting material designed to enable organizations to develop an architecture-centric approach to enterprise-class IT initiatives. ITSO presents successful technology strategies and solution designs by defining universally adopted architecture concepts, principles, guidelines, standards, and patterns.

x ITSO is made up of three primary elements: Oracle Reference Architecture (ORA) defines a detailed and consistent architecture for developing and integrating solutions based on Oracle technologies. The reference architecture offers architecture principles and guidance based on recommendations from technical experts across Oracle. It covers a broad spectrum of concerns pertaining to technology architecture, including middleware, database, hardware, processes, and services. Enterprise Technology Strategies (ETS) offer valuable guidance on the adoption of horizontal technologies for the enterprise. They explain how to successfully execute a strategy by addressing concerns pertaining to architecture, technology, engineering, strategy, and governance. An organization can use this material to measure their maturity, develop their strategy, and achieve greater levels of adoption and success. In addition, each ETS extends the Oracle Reference Architecture by adding the unique capabilities and components provided by that particular technology. It offers a horizontal technology-based perspective of ORA. Enterprise Solution Designs (ESD) are industry specific solution perspectives based on ORA. They define the high level business processes and functions, and the software capabilities in an underlying technology infrastructure that are required to build enterprise-wide industry solutions. ESDs also map the relevant application and technology products against solutions to illustrate how capabilities in Oracle's complete integrated stack can best meet the business, technical, and quality of service requirements within a particular industry.

Document Map The picture below shows the document map of documents related to this document.

xi ■ Creating a Roadmap for Cloud

■ Building Cloud Infrastructure - Implementation of Physical and Management Infrastructure

■ Building Application Services

■ Building Infrastructure and Platform Cloud Services (this document) This document is one of the series of documents that comprise Oracle Reference Architecture. It is a practitioner's guide that focuses on a methodology to build Infrastructure and Platform Cloud services. Please consult the ITSO web site for a complete listing of ORA documents as well as other materials in the ITSO series.

Other References

■ "Database Consolidation onto Private Clouds", Oracle Whitepaper. By Vengurlekar et al.

■ "ITIL Best Practices with Oracle Enterprise Manager 10g and Oracle PeopleSoft Help Desk", Oracle Whitepaper, By Sharma, et al.

■ "Billing and Revenue Management for Cloud Computing", Oracle BRM datasheet.

■ http://www.opengroup.org/projects/soa-soi/, Service Oriented Cloud Computing Infrastructure, The Open Group

■ "Oracle Cloud Computing", June 2011 Oracle whitepaper, Rex Wang.

■ Oracle Cloud Resource Model API - http://www.oracle.com/technetwork/topics/cloud/oracle-cloud-resource-mo del-api-154279.pdf

■ "Oracle ExaLogic Elastic Cloud - A brief introduction", Oracle Whitepaper, By Piech, Palmeter, Lehman

■ Oracle Cloud Resource Model API - http://www.oracle.com/technetwork/topics/cloud/oracle-cloud-resource-mo del-api-154279.pdf

xii Conventions The following text conventions are used in this document:

Convention Meaning boldface Boldface type indicates graphical user interface elements associated with an action, or terms defined in text or the glossary. italic Italic type indicates book titles, emphasis, or placeholder variables for which you supply particular values. monospace Monospace type indicates commands within a paragraph, URLs, code in examples, text that appears on the screen, or text that you enter.

xiii xiv 1

Introduction1

Most organizations have recognized Cloud Computing as a key strategy for enabling business agility and organizational efficiency. Successful adoption of Cloud Computing requires a) clearly defined roadmap and b) well-defined development and operational processes for building and operating Cloud infrastructure and Cloud services. Deploying business applications using a Cloud delivery model provides several benefits to the organizations. Cloud introduces new ways of developing, deploying, and managing applications. This kind of a paradigm shift is hard to achieve without changes to traditional organization structure and development processes. Organizations that are already using Service Oriented Architecture (SOA) would typically find it easier to adopt Cloud since SOA and Cloud require several organizational changes that are very similar. Other organizations may need to look at the way they develop IT capabilities and make necessary changes to take advantage of Cloud. This doesn't mean that existing methodologies need to be replaced with brand new methodologies but some adjustments may be needed to accommodate this shift. The process of developing Infrastructure as a Service (IaaS), Platform as a Service (PaaS), and Software as a Service (SaaS) are somewhat different. This guide focuses on the methodology for building Infrastructure and Platform Cloud services. The methodology described in this document is intended to be customized for the needs of your organization. "One size fits all" approach may not be suitable for Cloud due to the variety of different forms and magnitudes that it can take. This methodology provides general guidance on what needs to happen, but it is flexible enough to be customized if needed. For example, the order in which Cloud services are identified and built may depend on the specific strategy of the organization. What should be determined first? Is it the service model or the deployment model? Each approach has its own merits and pitfalls, so organizations should make the choice of whichever approach works better for them. The ITSO document "Creating a Cloud Roadmap" defines the process of creating a pragmatic roadmap for Cloud. The roadmap activity typically spins off several projects that include service development and infrastructure build out projects. The intention of this document is not to provide a comprehensive end-to-end process that replaces the existing software development methodology used in enterprises but rather highlight the variations required to successfully build Cloud services so that existing development process can be modified accordingly. However, if you do not currently use a formal process for software development, this process can be adopted as the primary development process.

Introduction 1-1 Program level and Project Level Scopes

1.1 Program level and Project Level Scopes Some of the Cloud service development activities are program level activities and some are project level activities. Program level activities are typically strategic initiatives that benefit multiple business units and projects. These include roadmap creation, reference architecture development, Cloud governance, Cloud method development, Cloud service portfolio management etc. Projects represent discrete units of work that come together to create coordinated Cloud services. Cloud development is supported by multiple projects. Figure 1–1 illustrates three types of projects -

■ Cloud services project,

■ Business delivery project, and

■ Cloud infrastructure project Cloud infrastructure projects may be required to support the other types if it is not already established; however, infrastructure is out of scope for this document and is covered separately in the Building Cloud Infrastructure document.

Figure 1–1 Cloud Service Development - Program and Project Scopes

Figure 1–1 shows multiple entry points into Cloud service development. Enterprises are typically both consumers (public Cloud services directly consumed by business or brokered by IT) and providers (private Cloud services built and operated by IT). This is slightly different from a Commercial Cloud Service Provider (CCSP) as Commercial Cloud Service Providers predetermine commercial Cloud services based on the market drivers and their own business strategy. In contrast, an enterprise business delivery project identifies Cloud services based on the Infrastructure and Platform requirements of a project. Non-functional (non-business functional, to be precise) and technical requirements have a major impact on the design of Infrastructure and Platform Cloud services. IT initiatives such as modernization, consolidation, and rationalization may result in the identification of IT capabilities that could be implemented with new or existing

1-2 Building Infrastructure and Platform Cloud Services Cloud Service Development Phases for Platform and Infrastructure Services

Cloud services. For example, existing servers may be consolidated and migrated to a private Cloud for agility and cost reduction reasons. Another entry point shown in the diagram is from the road mapping activity where Cloud services are strategically identified based on the business drivers, strategy, and roadmap. These requirements are further refined and implemented by the Cloud services project. It is important to note that Cloud services should be built future proof and hence need to be designed to handle the future load requirements and spikes in current demand. So the service development and portfolio management activities need to be aligned to ensure that Cloud services can support the needs of the current and near-future projects. Cloud services must be designed to elastically scale on demand, however a Cloud Provider must ensure that there are sufficient resources available to support the stringent scalability requirements of the Cloud. The "Building Cloud Infrastructure" document covers this topic in more detail.

1.2 Cloud Service Development Phases for Platform and Infrastructure Services The activities for building Cloud services are categorized under three major focus areas - Envision, Implement, and Operate. These focus areas are described in the "Creating a Cloud Roadmap" document. This document focuses mostly on the Implement focus area in which the services are developed. There are some touch points to the Envision and Operate focus area activities that are important to point out and they are briefly described in this document.

Figure 1–2 Cloud Service Development Process

Implement focus area has four phases - Inception, Elaboration, Construction, and Transition. These phases closely align with Oracle Unified Method (OUM) and (UP). Figure 1–2 shows the high level activities involved in the Cloud service development process grouped by phases.

Introduction 1-3 Relevant Topics

The key activities in the Inception phase are listed below. Project Requirement Analysis: Business requirements are analyzed as part of the business delivery project and they are refined into enterprise requirements. Cloud Service Requirement Analysis: The requirements specific to Cloud services are handed over to the enterprise Cloud services delivery project. Cloud Service Identification: Cloud services are identified based on the requirements and justified for development by the Cloud service creation team. Identified Cloud services need to be aligned with the migration of applications to the Cloud. The Cloud Portfolio Management and Release Planning activity shown in the Envision focus area covers these details and updates to the Cloud service catalog. The purpose of Elaboration phase is to define the service interfaces and design the service implementation. Construction phase deals with the implementation of the Cloud service. One of the main concerns addressed in this phase is packaging and assembly of Cloud services into deployable entities that allow rapid provisioning and decommissioning. Also Cloud service testing is performed in this phase. The last phase of the Implement focus area is Transition in which User Acceptance Testing (UAT) and production deployment activities happen. UAT in the Cloud context could be somewhat different from the traditional UAT. This topic is discussed in more detail in Section 5.1, "User Acceptance Testing". "Operations" is an important part of Cloud management. That's why in Figure 1–2, it is shown as a separate focus area called "Operate".

1.3 Relevant Topics The following topics are out of scope for the purposes of this document. However, it is important to consider these in the context of the service development method.

■ IT demand management - how IT demand is managed and channeled into the Cloud service development process.

■ Funding model - how Cloud service development and operations will be funded. Is funding coming from the central budget or is it based on a cost allocation model?

■ Architecture development methodology - how is developed at the enterprise and how it influences Cloud service development. More broadly how an Enterprise Architecture framework would be used in concert with the Cloud service development method.

■ Cloud application migration - migration of existing applications to the Cloud services being built.

■ Cloud service portfolio management - management of the portfolio of Cloud services in the broader context of IT portfolio management.

■ Cloud Governance - Gates and check points in the process, definition of roles and responsibilities (typically by means of a RACI chart or something similar), policy exceptions and escalations, Cloud migration policies and guidelines.

1-4 Building Infrastructure and Platform Cloud Services 2

Inception2

The goals of the Inception phase include the following:

■ analyze project requirements

■ identify Cloud specific requirements

■ define the scope and boundary conditions

■ identify Cloud service candidates

■ manage Cloud service portfolio As with traditional , Cloud service development also begins with requirements. However, Infrastructure and Platform service development differs from traditional engineering in a couple of ways. In most cases, Cloud services are enterprise scoped services shared by multiple projects within the enterprise, which means that the method should support identifying common requirements and isolating or refining them into enterprise requirements that can further be built as Cloud services. In addition, the Infrastructure and Platform services are more appropriate to be developed by the operations team or "DevOps team" as opposed to the development teams in the traditional case.

2.1 Inception Phase Activities Figure 2–1 shows the high level activities of the Inception phase. Cloud benefits are about scale and is an investment that provides increasing returns as more applications are deployed to it, just as SOA did. Some of the service requirements are identified based on common requirements across multiple in-flight projects. In some cases, the first application that deploys in the Cloud provides the requirements for the Cloud service; however the service should be built with reuse in mind so that future projects can use the Cloud service with little or no change.

Figure 2–1 Inception Phase Activities

Inception 2-1 Requirement Analysis

2.2 Requirement Analysis This phase begins with activities to gather the project requirements. The enterprise requirements that are common to multiple projects are then refined and classified at the enterprise level. Figure 2–2 shows the high level process steps involved in this part.

Figure 2–2 Requirements Analysis

In the case of an enterprise, business delivery projects receive requirements from the line of business for implementing specific business functionality. Commercial Cloud Service Providers define their requirements based on market demand and their business strategy. One of the first steps in the Cloud service development process is to identify the common or shared requirements and refine them into enterprise requirements. These requirements are to become enterprise assets and should be maintained at the enterprise scope. Typically an asset management or metadata management repository is used for this purpose. The requirements that drive Infrastructure and Platform services are identified and developed into Cloud service requirements. Infrastructure and Platform services are primarily influenced by non-functional or technical requirements and architecture standards. In this context, "non-functional" requirements mean that they are not functional-business requirements. Enterprise IT requirements may also drive the need for Infrastructure and Platform Cloud services. These requirements typically stem from the IT cost reduction efforts that result in data center modernization, consolidation, and application rationalization initiatives. For example, IT may decide to modernize the infrastructure to the latest hardware and storage technologies. In order to minimize impact to the business, IT would perform this upgrade using a phased approach. Some organizations choose to rollout virtualization first, and then replace the underlying hardware. By moving to Cloud, these organizations can build the virtualization layer, and Cloud capabilities first, and then can migrate all the applications to the Cloud running on modernized hardware. The following types of requirements, most of which are related to the Cloud characteristics, are important while identifying the Infrastructure and Platform services.

2-2 Building Infrastructure and Platform Cloud Services Requirement Analysis

Cloud service scalability requirements - current and future demands, anticipated spikes in load Cloud service Availability requirements - up time, business criticality, business impact, business continuity, and disaster recovery Service invocation requirements (API) - How the users (mostly the application developers) are going to use and manage the service. Elasticity requirements - Does the service require dynamic scaling up and scaling down? How fast do the service instances need to be added or removed? Security requirements - includes a definition of the security entities, how data and application will be secured, authentication, authorization, and audit requirements. Integration requirements - internal and external integration requirements. Self service provisioning needs - this is typically not a directly stated business requirement but rather derived from the agility requirements such as time-to-deploy. Self service management needs - If the development team or "" team can manage certain aspects of the application, what would they be? Typically application management is performed by the business owners (or their IT delivery teams) while the operations of the platform and infrastructure are performed by the operations team. "Devops" typically falls between these opposite ends of the process, encompassing packaging and promotion of releases to testing environments for example. Metrics - What metrics need to be collected?

■ Service business metrics - usage, reuse, ROI, number of business units supported etc

■ Service operational metrics - up time, health, utilization metrics etc

2.2.1 Classification of Cloud service requirements Business requirements are categorized into one or more of the following:

■ Green-field requirements that require brand new "business" capabilities to be built. To deliver these requirements, new business services and IT capabilities need to be developed. Green-field requirements stem from business initiatives such as introduction of new products, entry into new markets, and enhancing competitive advantage through time-to-market or agility improvements. – This is a top-down scenario where new Cloud services are identified to enable the business capabilities. – Application or Service components are not known initially in this case. So, initial analysis must be done to learn enough about the components to determine Cloud fit and to decide which kind of Cloud they should be deployed to.

■ Requirements that aim to reduce cost or improve customer experience of existing business interactions (not by introducing new features or capabilities). – These requirements do not require new "business" capabilities to be delivered but are geared towards reducing overall IT cost and improving the performance of IT and business applications. – While these projects might aim to reduce IT costs, this will still need engagement with business users because at the very least they may need to be involved in a regression testing effort after a migration.

Inception 2-3 Cloud Service Identification

– Please note that this case may require new "IT" capabilities to be built to deliver cost reduction or to improve performance. – This is a bottom-up scenario, where existing IT capabilities are re-architected and migrated to Cloud architecture through IT consolidation, rationalization, or modernization initiatives. – Migration efforts may identify new Cloud services to be created using existing assets or existing Cloud services to be discovered for the migration of applications. – Application or Service components are already known in this case and the key task is to identify if they are suitable for Cloud deployment and the type of Cloud based on the business requirements and architecture constraints.

■ The third classification is where pre-built new services are added to the existing Cloud service portfolio. This is a common scenario for Commercial Cloud Service Providers who often add new Cloud services through M&A. – Some level of standardization is required to rebrand the acquired services and integrate with the common Cloud management infrastructure. – Possibly migrate the services to run in the base Cloud. – Integration of these acquired Cloud services into the Cloud service catalog.

2.3 Cloud Service Identification The three key dimensions that influence Cloud service development are a) the service model b) the deployment model and c) the role, as shown in Figure 2–3.

Figure 2–3 Cloud Service Development Influencing Factors

The service model determines which layer or layers of the architecture will support the requirements. An enterprise may decide to build SaaS Cloud on custom platform or deploy business applications on a PaaS Cloud. The choice of architecture depends on the enterprise guiding principles, perceived benefits, and architecture factors. The deployment model determines where the Cloud service is going to reside and how it will be managed. The requirements may be fulfilled by custom in-house

2-4 Building Infrastructure and Platform Cloud Services Cloud Service Identification

development or off the shelf Cloud offerings from external providers. This choice has a major impact on Cloud service development. Another aspect that has a major impact on how Cloud services are built is the role of the organization building the Cloud services. A commercial Cloud provider may gather requirements, build and manage services differently than an enterprise that is building the Cloud for its own use. Similarly, an Intermediation Cloud broker plays the role of both consumer and provider, which makes their Cloud service development lifecycle unique. What happens in the development of Cloud services depends on roles such as builder, consumer, and operator.

2.3.1 Basic steps in Cloud service identification Once Cloud service requirements are documented, Cloud service candidates can be identified. Figure 2–4 illustrates the high level steps in the Cloud service identification process.

Figure 2–4 Cloud Service Identification Steps

The deployment model is generally determined before identifying the service candidates, although it is not a requirement. The roadmap planning process would dictate the deployment model in most cases. Enterprises define principles to define and enforce the choice of their deployment models. For example, some corporations favor public Clouds, while some discourage the use of public Clouds. Given the requirements, public Cloud suitability is assessed based on architecture standards and principles. Deployment model decision made during the roadmap process is further validated and refined in this phase. Once the deployment model is determined, service models are identified. Business requirements may drive the need for Infrastructure Cloud service, Platform Cloud service, or both. Building PaaS does not require IaaS, but either a PaaS or a traditional platform can be deployed on IaaS. For example, a Java application server platform may be identified as a potential PaaS. This Cloud service may be designed to run on a "Compute" node that's offered as IaaS. Alternatively, you may decide to run the Java application server on dedicated hardware that is not exposed as IaaS. Services identified may fall into one of the following categories.

Inception 2-5 Cloud Service Identification

a) Existing Cloud Service - a service identified already exists in the enterprise or public domain b) New Cloud Service - the service identified does not exist and needs to be built c) Modified Cloud Service - an existing Cloud service has been discovered but it requires modifications before it could be used for this project The final step in the process is identifying the workloads and their characteristics. Workload requirements validate the Cloud service definitions and ensure that the Cloud services built are suitable for deploying the workload. Some of the questions to ask are -

■ Is the workload permanent or transient?

■ Is it a batch program or OLTP application?

■ Is the workload going to have sudden spikes?

■ What business processes are run on the workload? This is important to identify the business unit, criticality etc of the workload.

■ Who has organizational ownership of the workload? This is important because often cloud adoption is driven within organizations by a 'champion' - so can directly affect the cloud service identification process.

■ Are there any constraints or restrictions on the workload? The service may not support some of the features of the underlying service platform. For example, a Java service provider may exclude the use of RMI and thread management.

2.3.2 Detailed activities in the Cloud service identification process As shown in Figure 2–5, Cloud service identification deals with the procedures and guidelines that an enterprise adopts to identify new Cloud service candidates.

Figure 2–5 Cloud Service Identification Process

As described in the Requirements Analysis section, Infrastructure and Platform service requirements are identified based on the Infrastructure and Platform requirements of

2-6 Building Infrastructure and Platform Cloud Services Cloud Service Identification the project and are refined to the enterprise scope. These requirements are further analyzed by the service creation project team for validity before identifying services for implementation. In most cases, non-functional requirements and architecture standards drive the platform, technology, and information decisions. These decisions drive the choice of the platforms (PaaS), databases, and Infrastructure (IaaS). Functional requirements primarily drive SaaS decisions. Functional requirements are implemented as SOA services or application components. SOA services can be deployed as Cloud SaaS offerings. Application components may be custom developed or acquired as a COTS product, if available commercially. SaaS service development is covered in the "Building application Services for Cloud" document. Identifying these services is part science and part art. There is no clear-cut and prescriptive method to identify cloud services. This section provides some general ideas and guidelines to help you identify the Cloud services. The applicability of each of the following services is dependent on the requirements.

2.3.2.1 Platforms The platform on which the application will be run is determined based on the technical requirements and architecture standards. For example, latency and HA requirements may lead to the selection of a Java based application server and the architecture standards may narrow it down to Oracle WebLogic server platform. If the project has reliability or store-and-forward requirements, there is most likely a need for a queuing platform.

2.3.2.2 Database Database presents another opportunity to leverage Cloud services. Most applications require a database of some kind and architectural standards typically dictate the use of a standardized database version across the enterprise. Making the database available as a Cloud service is one way of enforcing these standards. In addition it provides automated provisioning and self service benefits that are inherent to a Database as a Service (DBaaS) cloud. A number of key issues need to be considered while identifying database services. These include data availability, data redundancy, backup and restore, and performance.

2.3.2.3 Infrastructure The next step is to identify the infrastructure needs of the project. Infrastructure includes compute capacity, storage capacity, and network components. A ballpark estimate of the resource requirements with an understanding of the low and high usage marks is helpful in deciding the infrastructure services. If an organization chooses multiple numbers of smaller compute nodes, it will be necessary to make sure that the workload can scale horizontally. The requirements also drive the type and size of the storage service. To determine the best suitable storage service, it is essential to understand the nature of the workload. The design of the storage service is influenced by factors such as:

■ Is the data mostly read-only or read-write?

■ Is data access "chatty" (small chunks of data accessed frequently) or is it large amounts of data accessed infrequently?

Inception 2-7 Cloud Service Identification

■ Performance requirements for data access that may further drive the physical characteristics of the storage technology

■ Monitoring and management needs Network components such as load balancers and routers also need to be considered. Security requirements drive the need for one or more firewalls in the architecture. Network components are typically shared across multiple Cloud services and multi-tenancy is a key consideration when identifying such services.

2.3.2.4 Extension services Consumers of Cloud services often find a need to customize the functionality offered by the service. Since Cloud services are typically shared across multiple tenants, providers are restrained from allowing the consumers to modify them; instead they allow the consumers to extend them with additional services in the layers below. For example, a SaaS service provider may offer a platform Cloud service so that the consumer could extend the functionality of the SaaS offering. The ITSO Cloud Foundation document describes the layering of Cloud services. It also explains that it is not required to build a Cloud service on top of a lower layer Cloud service. For example, it is not required to build a Platform service on an Infrastructure service.

2.3.2.5 Capacity Planning Once the Cloud service candidates are identified, they need to be sized to ensure that the needs of all projects are met. Capacity planning should take into account the peak load and future growth requirements of all projects using the service. Capacity planning for Cloud is a very detailed topic in itself and is not covered in this document. What's important to understand is that capacity planning must be performed as part of the Cloud method to ensure that the resources are sized appropriately. For infrastructure services, it must be ensured that there are sufficient compute resources available to handle current requirements and future growth. For platform services, in addition to ensuring that sufficient compute capacity is available, the design must allow for horizontal scale out and scale back.

2.3.2.6 Development Cloud Development platform is another area where there is a great opportunity to utilize Cloud services. Many organizations use a type of hybrid Cloud known as the "Lifecycle distribution" in which development happens in a Cloud. Since most organizations standardize their development technologies, it is a good idea to make the development platform available as a Cloud service for quick deployment of the development environment. If development will be done on a platform Cloud service, the needs are identified in this step.

2.3.2.7 Cloud candidate services stack A useful exercise is to build a Cloud candidate services stack model by identifying the Infrastructure and Platform services and the dependencies between them. Figure 2–6 shows a conceptual view of the Cloud candidate services stack model.

2-8 Building Infrastructure and Platform Cloud Services Cloud Service Identification

Figure 2–6 Cloud Candidate Services Stack Model

An example of the Cloud candidate services stack is shown in Figure 2–7. It illustrates that the JMS and Java platform services are running on a 2CPU/16GB RAM compute service while the database service is running on a dedicated Exadata node. The red arrows illustrate the request flow across these services.

Figure 2–7 Cloud Candidate Services Stack Example

This model serves multiple purposes. It shows how the services (or service candidates) are stacked up and the dependencies between them. It also shows which services are built on other services and which ones are extension services. Finally, it can be used to identify new services, existing services, and modified services.

Inception 2-9 Cloud Service Identification

2.3.2.8 Defining the service boundaries The service candidates identified can now be reviewed further to define or refine the service boundaries. The following list captures some of the considerations for defining service boundaries.

■ Does the service candidate need to be redefined into two or more services based on architecture constraints or performance benefits?

■ Should the service be deployed on a dedicated hardware or a compute service? For example, Oracle WebLogic service may be deployed on an Exalogic engineered system or a generic compute node. If you decide to deploy it on a generic compute node, then the service will be split into a WebLogic platform service and a compute infrastructure service (IaaS).

■ Is it necessary to combine two or more service candidates into one for performance, security, or deployment reasons?

■ Should we impose any restrictions on the service candidate for security or scalability reasons?

■ Is the service candidate based on the principles defined by the reference architecture and governance framework?

■ What are the principles governing the Cloud strategy? Does the enterprise favor public SaaS services over building the services in house? Does the enterprise build SaaS over PaaS or over dedicated platforms?

■ Is there a difference in approach between core strategic functions and commodity support functions?

2.3.2.9 Determining the deployment model After identifying application components, the right deployment model should be determined. Deployment model is typically determined during the roadmap process. In this step, the deployment model is validated with suitability analysis of finer grained components. The Oracle Cloud Candidate Selection Tool (CCST) assists with the process of identifying the deployment model for specific components. It takes into account several factors including architectural characteristics and affinity between services and highlights the best fit deployment model for application components. If the business service or application identified is available as a third party commodity service offered by a public Cloud provider, it could be acquired as a subscription based service. Otherwise, the business service or application could be built as a private PaaS or a traditional application depending on the architecture guidelines. CCST is used to evaluate the architectural characteristics of components to determine a suitable deployment model. One of the critical evaluation metric in deciding the deployment model is data sensitivity. This applies to consideration of external deployment models, but also applies to Enterprise-Internal clouds which often utilize a multi-zone security model. For example, critically sensitive data may have a requirement to be hosted in the 'Restricted' security zone, which affects infrastructure and platform.

2.3.2.10 Service justification The Cloud service repository plays a key role in discovering existing Cloud services based on the need of the project in question. If the chosen deployment model is Public,

2-10 Building Infrastructure and Platform Cloud Services Cloud Service Portfolio Management and Release Planning

then public Cloud services from multiple vendors are compared to determine the best fit for the project requirements. New Cloud services or modifications to the existing services need to be justified before taken up for delivery by the Cloud service creation team. Resource constraints, architecture constraints, and economic rationale are some of the factors influencing this justification. If the Cloud service creation team decides not to build the Cloud service, it is passed back to the project team for localized implementation. If the discovered service requires modifications, an impact analysis should be performed to assess how the change will affect the existing consumers. A well-defined versioning approach is required to ensure that the new version is fully functional and the old versions are phased out. Service versioning is also essential to isolating and tracking modifications, and facilitating roll back to older versions.

2.3.2.11 Workload validation The final step in the identification process is to identify the various types of workloads to be supported and ensure that the service can support the workload requirements. Following are some example workload characteristics that need to be considered in service validation.

■ Batch processing workload that is going to be run at specific times of the day.

■ Transaction processing workload

■ Business continuity workload that is active only when the primary site goes down

■ Burst workload that is created as a result of peak load distribution

■ Development or test workload that is non-critical

■ Production workload that is mission-critical or business-critical

2.4 Cloud Service Portfolio Management and Release Planning Figure 2–8 shows the high level steps in Cloud Service Portfolio Management and Release Planning. Cloud services and their dependencies should be maintained in a catalog so that they can be discovered. The catalog also assists in synchronizing projects release schedules and Cloud service delivery schedules.

Inception 2-11 Cloud Service Portfolio Management and Release Planning

Figure 2–8 Cloud Service Portfolio Management and Release Planning

Cloud projects often suffer the dilemma of whether to create the services first and wait for the business units to start using them or build Cloud services as the need arises in the first project. Cloud release planning ensures that services are planned and developed in support of evolving business requirements. Cloud service metadata should be managed in an enterprise asset management tool such as an Enterprise Metadata Repository with associated dependencies. This goes a long way in ensuring that services are planned in line with the demand and the project teams utilize the services for effective cost control and maximal agility. A taxonomy to describe Cloud services may include metadata elements such as:

■ Projects using the Cloud service

■ Lifecycle environments the Cloud service is deployed on (e.g. Dev, Test, Prod)

■ Business units using the Cloud service

■ Cloud service template associated with the service

■ Assembly or topology Another aspect of Service Release Planning is to prioritize the service development based on available resources. Not all services identified by the Service Release Planning process are built in-house. IT may decide to broker some of the services from a public Cloud provider based on cost and time-to-deploy factors. Service creation and project delivery need to be synchronized such that Cloud services are ready and available for consumption when the business projects need them. Sometimes a private Cloud would have the necessary infrastructure built already; hence the deployment of Cloud services would be fast. However, there may be

2-12 Building Infrastructure and Platform Cloud Services An Example

instances where a service that doesn't currently exist is just identified for the needs of a particular project.

2.5 Putting it together Figure 2–9 shows the complete flow of activities in the Inception phase.

Figure 2–9 Inception Phase - putting it together

2.6 An Example

2.6.1 Problem ABC Bank is expanding into new markets and is introducing an options trading product. ABC Bank is already offering customers equity and Forex trading. Quotes must be provided really fast with less than 10ms delay. This low latency requirement is consistent across all three business lines (Equity, Forex, and Options/Derivatives). The business has also dictated that the application must be up 99.999% on trading days and the system should be able to accommodate additional load seamlessly as the quote volume varies widely. ABC Bank has presence in over 20 countries and different regions use the platform at different times. The traders of the bank use remote trading desks. ABC Bank is investigating whether Cloud is the right deployment option and if so, how the Cloud services can be identified.

2.6.2 Solution As part of a multi-year initiative, ABC Bank had started building a private Cloud two years ago. IT had built the necessary Cloud management infrastructure and a few Cloud services over the last two years. The private Cloud embodies all essential characteristics of Cloud, including self-service, elasticity, broad network access, measurement, and resource pooling. The elasticity and resource pooling are important because the different regions use the platform at different times, the broad network access is important for remote trading desks, the measurement and monitoring is important for checking the QoS. ABC Bank's IT has determined that one of the key components of the architecture is the Quotes Engine that provides option quotes. This component is very similar to the Quotes Engines used in the equity and Forex trading applications. The architecture team determines that these requirements can be fulfilled by a Complex Event Processing (CEP) product deployed on a hardware that supports low

Inception 2-13 An Example

latency and high availability requirements. The architecture team has established middleware standards. Oracle Event Processing (OEP) has been standardized as the preferred platform for Event Driven Architecture. Based on the initial requirements analysis the project team identifies two PaaS candidates and passes the requirements on to the Cloud service creation team. Based on the non-functional requirements of the project the architecture team identifies OEP as a service candidate. The availability and low latency requirements also drive the need for a high performance database as a service. The Cloud service creation team further analyzes the requirements and identifies an OEP service and an Oracle NoSQL database as a service. Further boundary analysis suggests that OEP deployed on Oracle Exalogic can better satisfy the business requirements. Cloud service creation team also determines that the Oracle NoSQL database can be deployed on an existing Infrastructure cloud service without any modifications. Since there is significant reuse of the OEP platform service and Oracle NoSQL database service, these services are easily justified and accepted by the service creation team. The service creation team also evaluates the workload requirements and ensures that the service can handle the load patterns.

2-14 Building Infrastructure and Platform Cloud Services 3

Elaboration3

Elaboration phase of Cloud service development includes definition and design activities as shown in Figure 3–1. A key input to this phase is the Cloud reference architecture. The output from the Inception phase forms the input for the Elaboration phase.

Figure 3–1 Elaboration Phase Activities

3.1 Cloud Service Definition Next step in the process is to define the Cloud service identified in the Inception phase. Figure 3–2 outlines the activities in the Cloud definition step of the process.

Elaboration 3-1 Cloud Service Definition

Figure 3–2 Cloud Service Definition

3.1.1 Defining Cloud Service Contracts Contracts define what the service offers and the SLA for the service from the consumer's view point. It is the business definition of the Cloud service that is likely to appear in the consumer-facing service catalog. Contracts are the agreements between the consumer and provider. Service providers typically provide a master contract that covers the terms between the provider and all the consumers of the service. Even for private Clouds, contracts should be defined between the IT department and users (business or IT) of Cloud services. Commercial Cloud service providers must define the internal Quality of Service (QoS) requirements to meet the SLAs published to the consumers. A Cloud service contract must also indicate how consumer's data and assets will be protected and what happens to the data when the consumer terminates the subscription.

3.1.2 Defining Service APIs APIs are an important component of Cloud services. For Infrastructure and Platform Cloud services, APIs specify how the service will be provisioned, managed, and accessed. As IT deployments are becoming more complex, an abstraction of the infrastructure resources become more relevant to address concerns of compliance and configuration. Furthermore, such abstractions enable consumers to both self serve and to operationally control these services without any significant administrator involvement. API specification is a key part of both IaaS and PaaS services. APIs are made available for the consumers to interact with the Cloud provider. Although there are no dominant standards at this point, providers must make their best effort to create and support standards based APIs for the management of infrastructure and platform.

3-2 Building Infrastructure and Platform Cloud Services Cloud Service Definition

3.1.2.1 Characteristics of good Cloud APIs Following list captures the characteristics of good Cloud APIs

■ Minimalistic design

■ Simple but complete

■ Standards support

■ Good documentation

■ Abstract

■ Encapsulate multiple Cloud resource management tasks into one

3.1.2.2 IaaS API IaaS API enables an infrastructure provider to service their customers by allowing them to

■ Browse templates that contain definitions and metadata of a logical unit of service

■ Deploy a template into the cloud and form an IT topology on demand

■ Perform operations on the resources

■ Take backups of the resources The specification of IaaS Cloud API should include:

■ Common behaviors that apply across all requests and responses, error messages, common resource attributes

■ Resource models, which describe the data structures used in requests and responses

■ The requests that may be sent to cloud resources, and the responses expected.

■ Which communication protocols to support, e.g., REST, SOAP, WS-*

3.1.2.3 PaaS API PaaS APIs are required to manage the building, running, administration, monitoring and patching of applications in the cloud. Figure 3–3 shows PaaS consumers managing their PaaS instances using the self service PaaS APIs. The platform implementation is responsible for translating the API request and orchestrating the underlying resources.

Elaboration 3-3 Cloud Service Definition

Figure 3–3 PaaS API

The following is a non-exhaustive list of common PaaS use cases which PaaS providers may choose to support. These are application oriented use cases that assume an entire application is deployed to the platform. This may not be the case, where the platform is just a queuing service or data warehouse service, for example.

■ Building and packaging an application in a local development environment

■ Building an application in a development environment running in the cloud

■ Importing a platform deployable entity into the cloud

■ Uploading application artifacts into the cloud

■ Run, stop, suspend, snapshot, and patch an application instance A standardized PaaS management API has the following benefits from the consumer point of view.

■ Portability - By standardizing the management API for the use cases around deploying, stopping, starting, and updating applications, the standardized API increases consumers' ability to port their applications between PaaS offerings.

■ Popular development environments could use the APIs to create plug-ins. Over time, such generic implementations are likely to be of higher quality than the implementations written for solitary, proprietary application management interfaces. For PaaS providers a standardized management API would bring the following benefits:

■ Because the strength and features of a PaaS offering's application management API are unlikely to be perceived as key differentiators from other PaaS offerings, the existence of a standardized management API allows providers to leverage the experience and insight of the specification's contributors and invest their design resources in other, more valuable areas.

3-4 Building Infrastructure and Platform Cloud Services Cloud Service Definition

■ By increasing the portability of applications between PaaS offerings, the management API helps "grow the pie" of the PaaS marketplace by addressing one of the key pain points for PaaS consumers.

3.1.3 Defining service specifications The service specification referred to in this step is a technical definition of the Cloud service which typicaly includes the technology attributes. The following are the key service definition activities.

■ Boundary analysis - identify the Cloud service boundaries by analyzing various influencing conditions against the Cloud service Candidate. Factors such as service scope, security policies, and QoS requirements may affect service boundaries.

■ Identify IaaS, PaaS, and SaaS services - break down services if necessary

■ Define the SLA for the service

■ Define the security aspects

■ Size the service – E.g. Compute - CPU size/RAM size, # of CPUs – E.g. Storage - storage capacity, mirroring etc – Platform - # of platform instances or cluster size, memory/heap size

■ Define HA and elasticity requirements

■ Define any self service interfaces

3.1.3.1 Template for Cloud service definition A sample template for capturing the Cloud service definition is provided below.

Cloud Service Name Name of the Cloud service Type of Service e.g. IaaS/PaaS/SaaS Sub-Type of Service e.g. Compute/Storage/Java/DB/Queue Description Deployment Model Public/private/hybrid Dependencies Elasticity How the service capacity is managed based on demand variations? Security Security provisions and compliance statements Workload Define what workload is supported by this characteristics service Metrics Define the metrics used to measure this service (e.g. CPU utilization, bandwidth, space used etc) Sizing Service sizing using the service-specific parameters

Elaboration 3-5 Cloud Service Definition

Access Method How the service will be accessed? Routing information and load balancing Isolation Define isolation strategy - data level, container level, application level, process level etc Multi-tenancy How would this service handle multiple consumers? What level of multi-tenancy will be used? Resource Pool Describe the underlying resource pool (e.g. virtualized infrastructure hosting VM's, a large VM hosing multiple Weblogic JVM's, a Database hosting multiple schemas) Service Class/Tiers These are typically the operational characteristics (e.g. backup frequency, retention period, etc) or service quality metrics (e.g. overprovisoining ratio) that form SLA's and are wrapped up into business language (e.g. Gold, Silver, Bronze) Deployment Zones This is a logical concept but can represent business units, data centers, infrastructure pods, security zones, etc (configurable to the enterprise within the management tooling) Unit of provision What is the consumer getting when this service is turned on? (e.g. a VM with OS pre-installed, a Weblogic JVM, a Database schema) Provisioning How is this service provisioned? What level of automation will be implemented? Subscription What's the best way to monetize this service? What subscription model is best suited? (business may choose to use a different model but this is the builder perspective of the subscription model) Monitoring and How is this service going to be monitored? diagnostics What kind of instrumentation and diagnostics will be provided? Scaling How is this service going to be scaled? Horizontal or vertical? Does the architecture support automation to provide elastic scaling capabilities? Language support What localized languages will be supported?

3.1.3.2 Defining Service metrics One of the essential characteristics of Cloud services is the ability to be measurable. Service definitions should include what metrics will be used to measure the usage of the service. Metrics may be simple or composite, typically composite for most services. This section presents some sample metrics for IaaS and PaaS.

3.1.3.2.1 IaaS Metrics IaaS services are specified broadly based on the fundamental resources such as compute capacity and storage capacity

■ CPU - CPU utilization %

■ CPU - config CPU Count #

3-6 Building Infrastructure and Platform Cloud Services Designing Cloud services

■ Memory Memory Usage GB

■ Memory - config Memory GB

■ Storage Disk space GB

■ Bandwidth Bandwidth Mbps

■ Other Costs System Count

■ Facility Base Facility charge $$

■ Facility Base Utility Charge $$

■ HA HA multiplier Times X

3.1.3.2.2 PaaS Metrics

■ DB Usage – DML Operations, DB Connections, Data transfer characteristics – DML Statements, Average/Max DB, pool Size, GB – DDL requirements

■ Deployed Entities – # of .ear, # of services – Number of Deployed Apps, Exposed Services

■ Service Consumption - Service Invocations, # of invocations

■ Usage Cost - Transaction Cost, # of transactions

3.2 Designing Cloud services Cloud service design should include detailed static and dynamic behavior models that show how the services are provisioned, managed, and self-serviced. Figure 3–4 shows the key activities in Cloud service design.

Elaboration 3-7 Designing Cloud services

Figure 3–4 Cloud Service Design

For Infrastructure and Platform services, service templates or assemblies are created from reference configurations. Service templates are instantiated to create deployable entities. APIs and service integration components are designed next. Some Cloud services need workflows that are specific to those services. These service specific workflows are to be designed as well. In a Test Driven Development (TDD) environment, test cases and test scripts are also created during the Elaboration phase.

3.2.1 Design Choices Cloud service design needs to consider several design choices and some of them are listed below.

■ If IT is going to build the service, what will be procured and what will be custom developed? Guiding principles around Buy vs. Build vs. Lease need to be developed.

■ Service model choices may change during the design process. For example, detailed design may identify the need for additional cloud services.

■ Multi-tenancy is another key consideration. How does the design support multiple consumers? For example, in the case of a DBaaS, how is multi-tenancy handled? Is data isolation handled at the database level, schema level, table level, or row level?

■ Security considerations - Is security infrastructure shared across multiple consumers and multiple service types (e.g. SaaS and PaaS)? How will the security identity domains be designed? Will the internal operators and administrators get their own identity domain?

■ How is the service going to be packaged and deployed? Can the packaging approach support the scale, velocity, and elasticity requirements of the Cloud?

■ Scalability - Scale and velocity are two of the key design considerations for Cloud. How is the service going to be scaled over long term? What are the capacity requirements? What is the strategy for long term scaling?

3-8 Building Infrastructure and Platform Cloud Services Designing Cloud services

■ High Availability - How do we ensure that the service is highly available? How is redundancy handled? How are load distribution and failover accomplished?

■ Elasticity - How is the service going to scale up and scale down as the workload requirements change? Does the infrastructure support automatic scale up and down? Does the service design support the elasticity requirements?

■ Self Service - Does the service design satisfy the self service requirements? How does it interact with the management infrastructure?

■ Metering and monitoring - How will the service metrics be collected and pushed to the Cloud management and monitoring framework?

3.2.2 Service Design Template This template captures the key elements of the Cloud service design.

Cloud Service Name Design Overview Static behavior Dynamic Behavior Elasticity Design Design that supports scale up and scale down of resources Security Design Security design aspects Metrics Collection Design of how metrics are collected Access Design Design details on how the service will be accessed Isolation design Design details on isolation strategy - data level, container level, application level, process level etc Multi-tenancy Supporting multiple tenants/consumers at various design levels of architecture. This should cover design issues such as how tenant data will be organized, how security infrastructure is shared, how the requests from different tenants are routed, and how the critical components of architecture are isolated. Provisioning design How the services will be provisioned and managed. Integration design Service integration design details including ecosystem integration points like DNS, DHCP, monitoring, etc. Scaling design How the services will be scaled. HA design High availability and redundancy design.

3.2.3 Service Assembly Template Service Assembly Template (SAT) is a collection of interrelated software components that are automatically configured to work together upon deployment. They are deployed onto a pool of hardware resources with minimal user input. From the user's perspective, SAT represents the definition of a deployable entity. Users can create cloud resources by specifying a Service Template in a deployment request. The cloud provider instantiates the resources and their configurations as specified in the SAT. SAT lists the components of the deployable entity and how they are packaged.

Elaboration 3-9 Putting it together

3.3 Putting it together Figure 3–5 shows all the activities in the Elaboration phase.

Figure 3–5 Elaboration Phase

3-10 Building Infrastructure and Platform Cloud Services 4

Construction4

Figure 4–1 shows the high level activities in the Construction phase of Cloud service development method. These activities are Cloud service implementation, Packaging and assembly, and Cloud service testing.

Figure 4–1 Construction Phase Activities

4.1 Cloud Service Implementation Figure 4–2 shows the key activities in Cloud service implementation.

Construction 4-1 Packaging and Assembly

Figure 4–2 Cloud Service Implementation

■ Hardware and software installations are usually covered as part of the infrastructure setup. This step verifies that the necessary hardware and software resources are installed and configured. If the hardware and software are already in place the necessary resources may be instantiated from existing resource pools. If not, they are installed and the necessary resources and resource pools are created.

■ Provisioning infrastructure is installed and configured. Provisioning infrastructure is necessary for deploying the service when consumers subscribe to the service.

■ Verify the network components and configure them if necessary. For example, the load balancers may need to be configured to route the consumer traffic to the respective service instances.

■ Configure security infrastructure and create security identity domains. Create security entities.

■ Build integration components. Most services require integration to databases or other services.

■ Build provisioning workflow components that are specific to the service.

4.2 Packaging and Assembly Figure 4–3 shows the activities in the Packaging and Assembly step.

4-2 Building Infrastructure and Platform Cloud Services Packaging and Assembly

Figure 4–3 Packaging and Assembly

■ Assembly templates are created from a reference configuration. The assembly template is a collection of interrelated software components that are automatically configured to work together upon deployment. Assemblies (logically called as deployable entities) are deployed onto a pool of hardware resources with minimal user input.

■ The Cloud service catalog is updated with the information about the new Cloud service.

■ The Cloud service is deployed in the test environment for testing.

4.2.1 Defining Deployable Entities The primary goal of the deployment infrastructure is to completely automate the actions required to deploy the functional components needed for a new service instance. In order to achieve this automation a virtualization solution is typically used, in which the service instance of a subscriber is created by deploying a set of deployable entities that embodies the topology needed. Each service in the Cloud will require a set of deployment entities that will be used to create each type of instance needed to provide the service. A deployable entity is typically a set of virtual machine templates along with a set of metadata describing the interrelationships between these templates as well as surrounding IT artifacts such as volumes, Virtual IP addresses (VIPs), Load Balancers (LBRs), Firewalls, etc. Each deployable entity describes the complete topology for a service so that a new instance of the service can be brought into being by assembling the deployable entity for that service. The deployment Infrastructure relies on a set of pooled IT resources such as a pool of hardware incorporated into a virtual machine pool and a Network Attached Storage (NAS) for shared storage. Deployable entities must provide a set of capabilities in order to be useful in a production environment, including:

■ Allow for the composition of components as well as external systems

■ Externalize configuration in the form of metadata that can easily be customized

Construction 4-3 Cloud Service Testing

■ Optionally define the start order of components to reflect interdependencies

■ Provide a management domain which integrates into existing management infrastructure allowing for metadata definition, deployment, oversight and diagnostics The notion of being able to create pre-built templates for deployment is extremely powerful and has a number of advantages that drive down operational costs and complexity. These include:

■ Ability to easily replicate deployable entities in production, even allowing for variations of the them without adding complexity

■ Reduced risk of configuration errors as deployable entities are moved between development, test and production environments

■ Replicated environments facilitate high-level standardization and consistency across application infrastructures, allowing for simple implementation of best practices.

■ Accelerated deployment of new infrastructures and applications In order to realize these benefits, a simple means of composing deployable entities of the components is required. Specifically what is needed is tooling that allows for the composition of components as well as endpoint mapping of externalized systems and other larger non-virtual systems such as databases and identity management servers. Tools that enable introspection of a running system in order to capture a metadata description of a known good configuration are especially valuable in making the process of defining deployable entities simple and reliable.

4.3 Cloud Service Testing Cloud service testing process is illustrated in Figure 4–4. The goal of this step is to test the platform and infrastructure Cloud services. This is not to be confused with Cloud Testing, which refers to the use of Cloud services for . The focus of Cloud service testing is to test the concerns specific to Cloud enablement.

4-4 Building Infrastructure and Platform Cloud Services Cloud Service Testing

Figure 4–4 Cloud Service Testing

Following list captures some of the key tasks involved in Cloud service testing.

■ Test the provisioning of Cloud services beginning from discovering the service in the service catalog, ordering, and deployment of the service. Provisioning process orchestrates several resources behind the scenes and the test cases should cover validation of each of the resources provisioned.

■ Test the service usage with test workloads that are similar to the anticipated consumer workloads.

■ Test service scalability, elasticity, and fault tolerance to ensure that the service level agreements can be met.

■ Test multi-tenancy and security of services. This is a key concern for most consumers and testing these capabilities and publishing the results will provide the necessary assurance to the consumers.

■ Test monitoring and management of the Cloud service. This includes both operational monitoring and self-service monitoring. Test all the self-service management capabilities made available to the consumers.

■ Test service termination and cleanup with particular focus on what happens to the consumer data after service termination.

■ Regression test the pieces as new services or capabilities are introduced to the cloud. The cloud will be evolving especially since initially it may not have all the cloud capabilities because it may take time to set up.

Construction 4-5 Putting it together

4.4 Putting it together Figure 4–5 shows all the activities in the Construction phase.

Figure 4–5 Construction Phase - Putting it together

4-6 Building Infrastructure and Platform Cloud Services 5

Transition5

The transition activities are a) User Acceptance Testing and b) Cloud Service Deployment. These high level activities are shown in Figure 5–1.

Figure 5–1 Transition Phase Activities

5.1 User Acceptance Testing The concept of UAT is another transformation triggered by Cloud, although not a "key" transformation. UAT typically suggests a closed cycle with control over access, and usually implies structured testing designed to poke at all features in a service and test data is "throw-away". UAT is still suitable for a private Cloud service, but public Cloud services frequently rely on an open-beta testing phase. This testing phase usually comes after functional testing / regression testing, and before revenue release. It's also a means to determine viability and works best for those applications where the consumer has a choice to not use the application (this is the "ROI Runway" criteria in CCST). Figure 5–2 shows the User Acceptance Test (UAT) activities.

Transition 5-1 User Acceptance Testing

Figure 5–2 User Acceptance Testing

UAT, in the traditional sense, is applicable more to the enterprise than a Commercial Cloud Service Provider (CCSP). A CCSP may allow a trial period during which the consumer may try the services and provide feedback. The following issues must be considered with respect to this kind of trial or open-beta testing.

■ What part of the lifecycle precedes and follows this testing?

■ What happens to the data? Is it retained and rolled forward to the next phase in the lifecycle, or thrown away? Or, more generally, are there any service level objectives, and if so, what are they?

■ What is the feedback mechanism? Is it active and formal, or passive and informal? Cloud application builders may test the service to ensure that the applications they build will run on the Cloud service. Enterprise UAT steps are similar to Testing steps in the construction phase but are performed by the users of the service.

■ Test the provisioning of services beginning from discovering the service in the service catalog and ordering the service.

■ Test service consumption by provisioning the service and testing its functionality.

■ Test service scalability, elasticity, and fault tolerance.

■ Test service multi-tenancy and security.

■ Test monitoring and management of the Cloud service

5-2 Building Infrastructure and Platform Cloud Services Cloud Service Deployment

■ Test service termination and cleanup from the user's perspective. The user might want to test data recovery after termination.

5.2 Cloud Service Deployment Cloud service is deployed to production next. The activities involved in this deployment are shown in Figure 5–3.

Figure 5–3 Cloud Service Deployment

The deployment activities are listed below.

■ Deploying the Cloud service is different from provisioning the Cloud service. Deployment deals with preparing the Cloud service for provisioning, which is really instantiating the Cloud service for the use of the consumers. One of the first steps is to deploy the deployable entities to production environment and to ensure that all the infrastructure and process components of the service are in place. If the platform services require software infrastructure to build and manage the workloads, that infrastructure needs to be deployed as well.

■ Configure the service catalog and publish the service. This requires defining appropriate taxonomy for the services. The service catalog is integrated with the order management system to ensure that the latest service information is displayed to the subscribers.

■ Perform a final testing of the Cloud service in the production environment.

■ During the Transition phase, minor revisions or changes to the software system may cause updates to any or all of the documentation work products.

Transition 5-3 Putting it together

■ Ongoing throughout the project, change and communication events targeted to specific audiences with the goal of mitigating identified risks and challenges are conducted. In addition, during Transition an IT Alignment is conducted and the transition plan is implemented.

■ Continue to conduct user learning events to ensure that the operations and support staff are trained to perform their duties.

■ Production go-live event to make the Cloud service available to the consumers.

5.3 Putting it together Figure 5–4 shows all the activities in the Transition phase.

Figure 5–4 Transition Phase - Putting it together

5-4 Building Infrastructure and Platform Cloud Services 6

Operate6

Operation is an important aspect of Cloud Computing. For that reason, Operate is a separate focus area in our method. Operate focus area has a phase called "Operations, Administration, and Management (OA&M)". Figure 6–1 shows the key activities in this phase.

Figure 6–1 Operate - OA&M Phase Activities

Production Performance Management is an extension of Performance Management techniques and approaches identified and implemented prior to production implementation. Performance metrics should be regularly collected and reviewed for all components. Although the basic strategy may be in place, variations in both requirements and performance are likely to be encountered. Proactive evaluation of variations to the baseline will help to identify potential performance issues before the user community notices the impact. Ongoing throughout the project, change and communication events targeted to specific audiences with the goal of mitigating identified risks and challenges are conducted. In addition, during Production, you conduct an effectiveness assessment to capture the efficiency of the work done during the project and highlight the change management work to continue after the Go Live to enable a shorter transition, as well as the IT Transition Plan prepared during Transition is implemented.

Operate 6-1 Operations Best Practices

Service management activities such as upgrades and patching are done by updating the deployment entities and redeploying them as opposed to patching the running instances. Since the services are shared across multiple consumers, the providers must develop policies around when the services can be upgraded and how the changes will be communicated to the consumers. Services must be continually monitored to ensure that the SLAs are met. Any violations to the SLA must be automatically detected and escalated. Metrics are constantly collected and passed on to the respective systems for analytics or billing purposes. The underlying Cloud infrastructure must provide ways of collecting and conveying the service specific metrics. The principle of charge-back or at least show-back is a powerful transformational lever in the deployment of a cloud when the aim is cost reduction. The cloud will constrain consumers because they have to share these resources with other stakeholders and cost is an important driver to move them from dedicated kit and applications to a shared platform. The consumers monitor the service they deploy using the self-service management capabilities. The provider is responsible for monitoring the platform components on which the service is running. Diagnostics and troubleshooting also happens at multiple levels. Consumers have access to the self-service logs, hence can diagnose any issues related to the payload. If the issue is in the underlying infrastructure, it is diagnosed by the Cloud provider operations team or support analyst groups. Backup and recovery capabilities are essential for any Cloud. Data and other assets must be backed up periodically and recovered when necessary. Cloud services may need to be retired at the end of their useful life. Cloud services may be retired for a variety of reasons such as technology obsolescence, market shift, changes in business priorities, and migrations. Older versions of Cloud services are typically retired to make way to new versions of services. In a multi-tenant subscriber environment, Cloud service retirement should be well planned and the subscribers must be provided with sufficient notice to migrate to the newer versions if applicable. Cloud Operations is covered in detail in the ITSO document, "Operating a Cloud".

6.1 Operations Best Practices Following list captures some of the Cloud operations best practices. Automated Provisioning - Provisioning must be automated through self-service capabilities Patch Management - Applying patches are not done the traditional way with Cloud. Any upgrades or patches are applied to the service template (deployable unit) and the service is redeployed. Self Service Administration - Consumers must be provided with a self-service administration interface to manage their services. Self healing - common issues must be automatically detected and systematically fixed using knowledge management techniques. Capacity management - Capacity must be proactively managed by taking into account the current and future demand for services. Additional capacity may be required to support the spikes in load.

6-2 Building Infrastructure and Platform Cloud Services 7

Summary7

Cloud is quickly becoming a key strategy for business and IT alignment and is starting to dominate architecture roadmap discussions. Most enterprises have either adopted or have plans to adopt Cloud as a strategic choice in support of their business and technology goals. Most Cloud implementations are going to involve some kind of a hybrid approach where enterprise private Clouds are integrated with either other private Clouds or public Clouds. Understanding both provider and consumer perspectives of the Cloud is necessary to successfully implement complex and highly-scalable Cloud infrastructures that support internal and external needs. Cloud services are differentiated from traditional IT applications by the scale, velocity, and the level of automation required. Building successful Cloud services requires well defined method, extensive planning, and precise execution to ensure that the services meet and support the business goals. The Cloud service development process for Infrastructure and Platform Cloud services defined in this document is intended to augment the existing methodologies or to serve as a starting point where no methodologies are currently being used. This process can be used with a variety of development methodologies such as Waterfall, Iterative, or Agile. It does not make any assumptions on if the methodology is iterative or not. The key is to identify the Cloud service requirements and build them at enterprise scope using dedicated specialist teams.

Summary 7-1