HIGH AVAILABILITY AS A SERVICE (HAAAS)

Mohamed Sohail Emanuela Caramagna Data Protection and Availability Specialist System Engineer Dell EMC Pure Storage [email protected] [email protected]

Sameh Gad Senior Consultant Dell EMC [email protected] Table of Contents Abstract ...... 2 Introduction ...... 5 Components of the Design ...... 5 Solution Roadmap ...... 6 List Failure Scenarios ...... 7 Example 1: (Failure scenario for an engineering system)...... 7 Evaluate Failure Scenarios ...... 10 Map Scenarios to Requirements ...... 11 Design solution ...... 13 The core architecture ...... 14 Why High availability ...... 15 The journey towards HAaaS ...... 16 Importance of HAaaS business model ...... 16 Risks of the Cloud – Fear of flying ...... 21 HAaS Use Cases ...... 22 1st use case ...... 22 High Availability as a Service in the Cloud ...... 22 2nd use case ...... 24 High Availability as a Service to the Cloud approach ...... 24 Conclusion ...... 26 Preferences ...... 27 List of figures ...... 28

Disclaimer: The views, processes or methodologies published in this article are those of the authors. They do not necessarily reflect Dell EMC’s views, processes or methodologies.

Dell, EMC and other trademarks are trademarks of Dell Inc. or its subsidiaries.

2016 EMC Proven Professional Knowledge Sharing 2

Abstract As society increasingly depends on computer-based systems, the need for ensuring that services are provided to end-users continuously has become critical. To build a computer system upon which people can depend, a system designer must first have a clear idea of all the potential causes that may bring down a system. Back in 1965, Digital Equipment Corp (DEC) changed the computer world by introducing the first open system IBM Mainframe technology, the PDP-8.

Figure 1: Mainframe system

This was the first commercial success in the minicomputer area, and opened the door for future scenarios and today’s open systems technology.

After almost 20 years of computer evolutions customers waited for more robust solutions with High Availability. In 1984, DEC introduced the first cluster system for VAX/VMS operating systems.

2016 EMC Proven Professional Knowledge Sharing 3

Figure 2: Digital cluster system

Over time, Cluster architectures have been adopted by most of the major vendors: HP, IBM, SUN, Linux distributions like Suse and Redhat, and Microsoft. Most of the cluster solutions worked in an active/passive configuration; the service is hosted on one node and in case of node failure, service is switched to another node.

This approach had points, such as:  Resources are dedicated to a specific cluster and cannot be shared.  Most cluster platforms have active-passive approach, meaning some resourses remain in stand-by state.  In most cases, the cluster protects against hardware failure, but restart on other nodes is not guaranteed; manual intervention is required in the event of unattended failure.  Cluster approach requires complex architecture with a very high level of knowledge to be implemented and managed.

This limitation created the need for a new architecture that can allow dynamic sharing of resources between nodes and provide more robust options in case of hardware failure.

2016 EMC Proven Professional Knowledge Sharing 4

Introduction High Availability is all about redundancy and the accuracy to identify the failure on all the IT Infrastructure component levels with an automated action.

Services must be classified based on each organization’s business requirement. The critical services require 24/7 availability with minimum or no downtime and optimum performance. This usually requires significant investments in IT infrastructure by having redundancy on all levels of the datacenter facilities including, Network, Compute, Applications clusters, and so on to achieve the target availability. Using today’s modern solutions, it has become much easier/faster to achieve the targets. High availability is vital to keep the services alive, as, that single server will not fulfill such target. However, there are other factors like hardware failures, data corruption, network outage, operating systems crash, or even bugs (in , Application Servers, Web Servers). So the target is to have a solution that prevents the downtime in such cases and recovers the situation immediately.

The cluster is a key component of any High Availability solution and is used for redundancy. It distributes the load between the different cluster nodes to achieve the required scalability, high response time, load balancing, and performance for mission critical applications/systems.

Components of the Design The design and high availability goal is very important to minimize downtime. The new automation technologies are now based on virtualization of the infrastructure, which increases the chance to define the requirements to be deployed with just a click (Request). Actually, the main objective is to have High availability as a service become easy as you just can select it as a choice, for example, website needs five-nines availability. This can then be reflected on the subsystems, and start to build all the prerequisites to achieve a 99.999 High available website.

2016 EMC Proven Professional Knowledge Sharing 5

The parameters below need to be fulfilled.  Expected Number of users/traffic  Web Servers  Application Servers  Servers  Storage IOPS  Network Traffic  Backup & Recovery

Solution Roadmap Up to now, we have learned about the basic principles of good system design for high availability: categorization in the system stack, redundancy, robustness and simplicity, and virtualization. But what does this mean for you if you are responsible for producing a solution and want to create the technical system design? Let us assume that you have written down the business objectives and the business processes that are relevant for your system architecture. After that you need to consider the following steps:  List failure scenarios  Evaluate scenarios, and determine their probability  Map scenarios to requirements  Design solution, using the dependency chart methodology  Review the solution, and check its behavior against failure scenarios

These steps are not just executed in sequence. Most important, solutions, requirements, and failure scenarios are not independent. If one has a different solution, there might well be different failures to consider. In addition, different solutions come with very different price tags attached. Business owners sometimes want to reconsider their requirements when they recognize that the protection against some failure scenarios costs more than the damage that might be caused by them. Therefore, during each step, evaluate if the results make it necessary to reconsider the previous steps' results. These feedback loops prevent the consistent-but-wrong design syndrome.

With this iterative approach in mind, let’s examine each of those steps in more detail.

2016 EMC Proven Professional Knowledge Sharing 6

List Failure Scenarios It is not realistic to list each and every incident that can render such complex systems unusable. For example, one can have an outage owing to resource overload that may be caused by many reasons: too many users, a software error, either in the application or the , a denial of service attack, etc. It is not possible to list all the reasons, but it is possible to list all components that can fail and what happens if they fail alone or in combination.

We can start by writing up the specific system stack without any redundancy information. Then we list for each component how that component can fail. The system stack already gives us a good component categorization that will help us categorize the failure scenarios as well. First, we will write up high-level failure scenarios, and then iterate over them and make them more precise by providing more detailed (and more technical) descriptions of what can go wrong.

Sometimes the owner of a business process has their own failure scenarios, e.g. from past incidents, that it wants to see covered. Usually, it is easy to add them to the list of generic failure scenarios. That is a good thing to do even if they are there already in a generalized form — it will bring better buy-in from that important stakeholder.

Example 1: (Failure scenario for an engineering system). ------The following list is an excerpt from failure scenarios for an engineering system that also utilizes a database with part detail information. This is the second iteration, where high-level failure scenarios (marked with bullets) are dissected into more specific scenarios (marked with dashes). The iteration process is not finished yet; the failure scenario list therefore is not complete and covers only exemplary failures.

But if you compare that list with the one from Table 2.5, it is clear that this is more structured and oriented along the system stack. It is the result of a structured analysis, and not of a brainstorming session:

2016 EMC Proven Professional Knowledge Sharing 7

User- or usage-caused failure  Deletion of a small amount of data (up to a few megabytes)  Deletion of a large amount of data (some gigabytes, up to terabytes)  Utilization of too many resources in a thread-based application  Flood of requests/jobs/transactions for a system

Administrator-caused failure  Deletion of application data  Deletion of user or group information  Change to configuration or program makes service nonfunctional  Incomplete change to configuration or program that makes failure protection nonfunctional (e.g. configuration change on a single cluster node)

Engineering application failures  Aborting of application  Corruption of data by application error  Loss of data by application error  Hung Java virtual machines  Memory leak consuming available main memory  File access denied owing to erroneous security setup

Database failures  Database file corrupted  Database content corrupted  Index corrupted  Database log corrupted  Deadlocks  Automatic recovery not successful, manual intervention needed

Operating system failures  Log files out of space  Disk full  Dead, frozen, or runaway processes  Operating system queues full (CPU load queue, disk, network, … )  Error in hardware driver leads to I/O corruption

2016 EMC Proven Professional Knowledge Sharing 8

File system corruption  Recover by journal possible  Automatic file system check time within the service level agreement (SLA)  Automatic file system check time beyond the SLA  Manual file system repair needed

Storage subsystem failure  Disk media failure  Microcode controller failure  Volume manager failure  Backplane failure  Storage switch interface failure

Hardware failure  CPU failure  Memory failure  Network interface card failure  Backplane failure  Uninterruptible power supply (UPS) failure

Physical environment destroyed  Power outage  Room destroyed (e.g. by fire)  Building destroyed (e.g. by flood)  Site destroyed (e.g. by airplane crash)  Town destroyed (e.g. by hurricane, large earthquake, war)

Infrastructure service unavailable  Active Directory/Lightweight Directory Access Protocol (LDAP) outage, not reachable, or corrupted  DNS not reachable  Loss of shared network infrastructure  Network latency extended beyond functionality  Virus attack  Switch or router failure

2016 EMC Proven Professional Knowledge Sharing 9

 Email not available  Backup server not reachable  License server outage or not reachable

Security incidents  Sabotage  Virus attacks  Denial of service attacks  Break-ins with suspected change of data

You might have noticed that some failure descriptions are quite coarse and do not go into much detail. Failure scenario selection is guided by experience, and in particular by experience with potential solutions. When one knows that all faults that are related to processes will have to be handled the same way (namely, the system must be restarted) it does not make much sense to distinguish whether the CPU load or the memory queue is full.

Evaluate Failure Scenarios For each failure scenario, you have to estimate two properties: 1. The probability of the failure 2. The damage that is caused by that failure

But in practice, we cannot determine numbers, neither for the probability nor for the damage. If we have a similar system running and have had incidents there, we can use this data for better approximations.

What we can do is determine the relative probability and the relative damage of the scenarios and map them on a two-dimensional graph. Figure 4.10 shows such a mapping for selected scenarios.

2016 EMC Proven Professional Knowledge Sharing 10

Figure 3: Scenario mapping on probability and damage estimation

Map Scenarios to Requirements Scenarios with high probability must be covered within the SLA requirements. All these failures must lead only to minor outages, i.e. to outages where work can continue in short timeframes. Protection against this class of failures falls in the realm of high availability.

Usually, some of the failure scenarios are expected to lead to no outage at all, also to no aborted user sessions. In particular, this is true for defects in disk storage media that happen quite often. When disks fail, backup disks must take over functionality without any interruption and without any state changes beyond the operating system or the storage subsystem.

Our knowledge of business objectives and processes, i.e., about the requirements, gives an initial assumption about maximum outage times per event and maximum outage times per month or per year for this class of failure scenarios. For example, business objectives would strive for at maximum 1 minute per incident and 2 minutes per month, during 14×5 business hours. (As mentioned in Chapter 1, such measurements are more illustrative than 99.99%.) Later, when we have seen the costs for such a solution, the

2016 EMC Proven Professional Knowledge Sharing 11 business owners might want to lower their requirements; then we have to iterate the process described.

There are failure scenarios with low probability and high potential damage that should be considered as major outages and will not be covered by SLAs. If we choose to protect against these failures as well, we need to introduce disaster-recovery solutions.

Again, requirements for disaster recovery come from business objectives and processes. The requirements are expressed in terms of recovery time objectives and recovery point objectives. For example, requirements might be to achieve functionality again within 72 hours of declaring the disaster, and to lose at most 4 hours of data.

At the very end, there are failure scenarios that we choose not to defend against. Most often, these failure scenarios are associated with damage to non-IT processes or systems that is even larger and makes the repair of IT systems unnecessary. It might also be that we judge their probability to be so low that we will live with it and do not want to spend money for protection. For example, while coastal regions or cities near rivers will often find it necessary to protect themselves against floods, businesses in inner areas will often shun protection against large-scale natural catastrophes like hurricanes or tsunamis.

Eventually, such scenario/requirements mapping means categorization of our scenario map. We color different areas and tell which kind of protection we want for these failure scenarios.

Figure 4 takes up Figure 3 and adds those areas. We can also have two other, similar, figures where we exchange the meaning of the x-axis. In the first one, we use outage times. Then we can have two markers, one for the maximum minor outage time and one for recovery time objective. The locations of some failure scenarios in this graph will change, but the idea is the same: We can show which failure scenario must be handled by which fault protection method. The second additional figure would use recovery point objectives on the x-axis and would show requirements on maximum data loss.

2016 EMC Proven Professional Knowledge Sharing 12

Figure 4: Requirement areas added to scenario mapping

It is important to point out that the chart has a large area where no scenario is placed and which is not touched by any of the requirement areas. We call this area the forbidden zone, as failure scenarios that appear subsequently must not be located there. If they are, we have to remap the scenarios and redesign our solution.

The possibility exists that there is a failure scenario with high probability and high damage, where the protection cost would be very high as well. For example, if an application allowed a user to erase several hundred gigabytes of data without being able to cancel the process, and without any undue facility, this might very well lead to a major outage. In such cases, the only possibility might be to change the application's code, or to select another application that provides similar functionality.

Design solution Up to now, we have learned about the basic principles of good system design for high availability: categorization in the system stack, redundancy, robustness and simplicity, and virtualization. But what does this mean for you if you are responsible for producing a solution and want to create the technical system design? Let us assume that you have written down the business objectives and the business processes that are relevant for your system architecture. If you want to produce

2016 EMC Proven Professional Knowledge Sharing 13 the what and how cells of the system architecture, you need to proceed in the following steps:

1. List failure scenarios 2. Evaluate scenarios, and determine their probability 3. Map scenarios to requirements 4. Design solution, using the dependency chart methodology 5. Review the solution, and check its behavior against failure scenarios

These steps are not just executed in sequence. Most important, solutions, requirements, and failure scenarios are not independent. If one has a different solution there might well be different failures to consider. Also, different solutions come with very different price tags attached. Business owners sometimes want to reconsider their requirements when they recognize that the protection against some failure scenarios costs more than the damage that might be caused by them. Therefore, during each step, we need to evaluate if the results make it necessary to reconsider the previous steps' results. These feedback loops prevent the consistent-but-wrong design syndrome.

With this iterative approach in mind, let us have a look at each of those steps in more detail.

The core architecture “VIRTUALIZATION” is the key word for today’s architectures and future years; VMware created the first-in-class hypervisor solution for open systems. VMware changed the game again and created a complete hypervisor infrastructure that allows installing and managing complete virtualized infrastructure.

Figure 5 shows some key points in VMware development history.

2016 EMC Proven Professional Knowledge Sharing 14

Figure 5: VMware development history overview

The new strategy includes a software layer that enables High Availability & Disaster Recovery like RecoverPoint and VPLEX deployed as a service in ViPR, with advanced reporting features like chargeback, capacity planning, service status reports and history, and self-service deploy approaches.

Why High Availability? The answer lies in the consequences when the desired services are not available. Imagine you were one of the one million mobile phone users in Finland who were affected by a widespread disturbance of a mobile telephone service [1] and had problems receiving your incoming calls and text messages. The interruption of service, reportedly caused by a data overload in the network, lasted for about seven hours during the day. You could also picture yourself as one of the four million mobile phone subscribers in Sweden when a fault, although not specified, caused the network to fail and unable to provide you with mobile phones services [2]. The disruption lasted for about twelve hours, beginning in the afternoon and continuing until around midnight.

Another high-profile and high-impact computer system failure was at Amazon Web Services [4] for providing web hosting services by means of its cloud infrastructure to many web sites. The failure was reportedly caused by an upgrade of network capacity and lasted for almost four days before the last affected consumer data were recovered [5], although 0.07% of the affected data could not be restored. The consequence of this failure was the

2016 EMC Proven Professional Knowledge Sharing 15 unavailability of services to the end customers of the web sites using the hosting services. Amazon had also paid 10-day service credits to those affected customers.

The journey towards HAaaS EMC made many steps toward achieving this vision of HAaaS with its federation portfolio.

Figure 6 illustrates the new strategy of a software layer that enables high availability based on Software-Defined Architecture..

Figure 6: SDDC diagram

We will show in the next pages our vision to achieve the 99.999 nines to reach this.

Importance of HAaaS business model High availability (HA) is paramount to the modern business and mission critical applications because of its critical position and open design. A key strength of the modern business is the ability to interact with multiple cross-format applications; however, this strength also creates multiple touch-points that can affect availability. The conclusion? Mission critical applications require a well-designed HA solution that can protect and maintain uptime not only for the Infrastructure services but also on the application level services.

2016 EMC Proven Professional Knowledge Sharing 16

Figure 7: Infrastructure as a Service as well the Platform as a Service leverage the High availability active-active solution

One of the success models is EMC model. EMC High Availability Stack is a solid solution to minimize the impact and downtime of critical applications and automate the recovery on the service level to insure continuous/maximum services availability. This can be done by integrating the VMware HA with the third party cluster software and insure 360-degree service availability.

The new approach for recovery time or estimated time of repair (ETR) can be minimized if the failure/fault is detected through hypervisor-level platforms.

High Availability includes:  HA Infrastructure/Storage  HA Network/Security  HA Servers/OS

HAaaS primary components are:  Cluster Software  Services integration with the cluster software

2016 EMC Proven Professional Knowledge Sharing 17

 Cluster File System  Application

Figure 8: Extended Cluster Service between the sites on the application level availability

Figure 9: protecting the database services against ESXi host failures.

The VMware capability can include the cluster services and manage the third party products such as shown in Figure 10, 11, 12.

2016 EMC Proven Professional Knowledge Sharing 18

 File Systems  Web Servers  Application Servers  Database Servers

Figure 10: High Availability with the application components included on top of the virtualization layer

Figure 11: the HAaaS targets control each tier availability making sure there is end-to-end redundancy and consistency on the virtualization level

2016 EMC Proven Professional Knowledge Sharing 19

Figure 12: High Availability is important as well on the application level not the VMs or the Hypervisors

There are available cluster software for the virtualized applications/databases provided by 3rd party products that can fulfill the requirements to have an end-to-end HAaaS Solution:

 HA Infrastructure/Storage  HA Network/Security  HA Servers/OS  HA File Systems  HA Applications (Existing 3rd Party Cluster Products/Services the Logical Layer)  HA Backup & Recovery

2016 EMC Proven Professional Knowledge Sharing 20

Figure 13 Cluster Active/Passive example will require minimal downtime to the DB service

Risks of the Cloud – Fear of flying Enterprises know that the cloud will change IT, but security and performance are a concern. Each cloud model has potential risks: reliability, adaptability, application compatibility, efficiency, scaling, locking, security, and compliance.

Companies must select an enterprise cloud solution to suit a complex mix of applications; these decisions require great care. This solution should offer the only enterprise-class cloud solution, designed for mission-critical applications, with performance and security built for the enterprise. To achieve this, Enterprise or service providers should combine existing IT by building private clouds, using virtual private clouds, and accessing public clouds.

Therefore, it might be beyond virtualization. This will require a mechanism that manages compute, memory, storage, and networking with small cloud units, μVMs. Unlike fixed-size VMs, μVMs are dynamically allocated, and optimized per application with simple automatic monitoring and control.

2016 EMC Proven Professional Knowledge Sharing 21

Figure 14: layers of sevices of data protection

In general, Protection services are divided into three levels of protection and EMC can cover all these requirements adding Virtustream for mission critical applications requirements. In addition, it is possible to create an architecture with multiple levels of protection to enable different service levels and multiple copies managed or retained inside the infrastructure.

The next two sessions use two real uses cases to describe in more detail.

HAaS Use Cases 1st use case High Availability as a Service in the Cloud In this real world example, the entire infrastructure is inside the Cloud Service provider site. Service provider has three sites with a HA + DR approach and HA sites can host active vApps.

Five years ago a regional telecommunication company decided to change its strategy by adding Cloud services within their service catalog. It was a real challenging situation for them as it was a new deployment from scratch, they did not know the market horizons, the technologies that can enable them to do so, and they were guided by the request of the customers.

Their internal infrastructure was based on VMware, Cisco, and EMC and they decided to maintain this infrastructure also for the services provided to their customers.

Their first Cloud service was computation with two sites configuration with VMware vCloud director and EMC VPLEX.

2016 EMC Proven Professional Knowledge Sharing 22

The company is well positioned within their region because they own the entire connectivity infrastructure, but on the initial phase, the problem was to “learn” how to sell Cloud services.

EMC supported them to create new services and offer them to customers in an innovative way.

Every HA architecture requires an application that supports HA such as VMware, Oracle, and Hyper-V. Enabling customers to create a vApp or a Virtual machine is the easiest and most flexible approach to implement HAaaS.

This Cloud provider can now offer different and combined levels of protection with different RTO / RPO and related cost. Compute services are delivered with VMware orchestrated by a vCloud Director with a virtual Data Center configured for every customer.

Cloud service provider configuration allows implementing different protection levels:

 Local Protection, with VPLEX HA on a specific site  Remote protection in HA with a VPLEX Metro configuration between two sites  Disaster Recovery to a remote site with RecoverPoint  Recovery from backup with Avamar and Data Domain configuration

Figure 15: Cloud provider disaster recovery

2016 EMC Proven Professional Knowledge Sharing 23

Cloud service provider service catalog allows combining different levels of protection to match the customer requirements.

One real example of a customer environment is shown below: Level Number of VM Total Size Configured services Mission Critical 10 4TB HA Local + Remote+DR+Backup Critical 30 6TB HA Remote+DR+Backup Standard 40 8TB DR+Backup Test&Dev 30 5TB Backup

This approach creates additional value on Cloud Provider services and enables customers to model cost of real Business values of the Virtual machine.

2nd use case High Availability as a Service to the Cloud approach

One year ago, an Italian service provider wanted to find a solution to offer a Hybrid Cloud HaaS/DRaaS to their customer. This service provider focuses on Virtual Environments, and offers Cloud services, and managed services. Their offering is created to answer several market segments, but their focus is concentrated on the Public sector where they have a considerable presence.

When RecoverPoint for Virtual Machine was presented, they immediately agreed that it was the solution for the new HaaS/DRaaS services that they want to launch.

They were not a strong EMC customer (most of the infrastructure was based on IBM and Dell technologies) but with RecoverPoint for VM, they could offer an independent service platform with different levels of RTO/RPO in base of connectivity.

The most important point for this service provider was that they should be able to offer a real Hybrid Cloud Solution with the possibility to migrate end customers’ infrastructure in the Cloud without any additional efforts.

This scenario could be deployed by a real HA implementation and the schema can be the same as for Cloud. In this scenario, the customer virtual machines are distributed dynamically between the available redundant sites.

2016 EMC Proven Professional Knowledge Sharing 24

Our customer can decide to adopt a strategy near HA with RecoverPoint for virtual machines. With this technology, a customer can have a synchronous replication with a minimum RTO time. This approach is positioned near the availability as represented in the figure below.

Figure 16: Near high availability - Near HA

With this technology, the Cloud service provider can define different level of protection:  Near HA with synchronous replication and local/remote protection  Disaster recovery with asynchronous replication and local/remote protection

This architecture allows sharing Cloud resources reducing service costs for provider and customer.

PROVIDER SITE

TENANT A TENANT B TENANT C Figure 17: Architecture overview

With this technology approach, it is possible to configure multiple site protection and the Cloud provider can manage, for example, two remote sites with two different copies and journaling of VMs.

2016 EMC Proven Professional Knowledge Sharing 25

This is an example of a topology with a shared vCenter inside the Cloud Provider site.

Figure 18: vCenter schema

RecoverPoint for virtual machine have several advantages inside a VMware environment:  Protection granularity is at VM level and not based on LUN.  Customer can define a restart sequence in case of disaster recovery, helping to create a flexible disaster recovery plan.  Cloud provider can manage VM directly from central vCenter.  Customer will have a self-service portal with guided procedure to protect a VM and test or manage a start on remote site  Product allows managing DR test inside isolated network and with different network addresses.  The entire infrastructure is virtual without any hardware appliance at customer site.

Conclusion High Availability as a Service becomes an essential part of the modern virtualization solutions to insure service availability. Figure 19 shows the importance of integrity between the application solutions to reach the targeted availability. However, each product can play a vital role during the system design and the automation process.

2016 EMC Proven Professional Knowledge Sharing 26

Considerations while building HAaaS:  Understand the exact business requirements including RTO, RPO, SLO, SLA, and SLM.  High availability requires a lot of investments, so it is crucial to evaluate the environment and choose the exact products/components. It’s a key success factor to accomplish this target successfully.  Skills are a very important in the design, building, and operation phases in order to maximize the efficiency/benefits from the HAaaS solutions and build a modern and stable cloud solution.

Figure 19: High availability scenario for Oracle Database Virtual Machine References  Service Availability: Principles and Practice by Maria Toeroe and Francis Tam (eds)  https://en.wikipedia.org/wiki/High_availability  https://en.wikipedia.org/wiki/High-availability_cluster  https://virtuallylg.wordpress.com/2013/10/10/comparing-vmware-vsphere-app-ha- with-symantec-applicationha/  http://www.storagereview.com/vmware_vmmark_virtualization_benchmark  http://mandarshinde.com/elasticsearch-basics/  http://virtcloud.blogspot.com.eg/2011/07/designing-your-private-cloud-with.html  High Availability and Disaster Recovery—Concepts, Design, Implementation  Virtuostream.com

2016 EMC Proven Professional Knowledge Sharing 27

List of figures Figure 1: Mainframe system ...... 3 Figure 2: Digital cluster system ...... 4 Figure 3: Scenario mapping on probability and damage estimation ...... 11 Figure 4: Requirement areas added to scenario mapping ...... 13 Figure 5: VMware development history overview ...... 15 Figure 6: SDDC diagram ...... 16 Figure 7 IaaS, PaaS and SaaS as part of Infrastructure as service as well the Platform as Service the High availability, as service is important ...... 17 Figure 8: Extended Cluster Service between the sites on the application level availability ... 18 Figure 9: protecting the database services against ESXi host failures...... 18 Figure 10: High Availability with the application components included on the top of the virtualization layer ...... 19 Figure 11: the HAaaS targets are to control each tier availability making sure there is end-to- end redundancy & consistency on the virtualization level ...... 19 Figure 12: High Availability important as well on the application level not the VMs or the Hypervisors ...... 20 Figure 13 Cluster Active/Passive example will require minimal downtime to failover the DB service ...... 21 Figure 14: layers of sevices of data protection ...... 22 Figure 15: Cloud provider disaster recovery ...... 23 Figure 16: Near high availability - Near HA ...... 25 Figure 17: Architecture overview ...... 25 Figure 18: Vcenter schema ...... 26 Figure 19: High availability scenario for Oracle Database Virtual Machine ...... 27

Dell EMC believes the information in this publication is accurate as of its publication date. The information is subject to change without notice.

THE INFORMATION IN THIS PUBLICATION IS PROVIDED “AS IS.” DELL EMC MAKES NO RESPRESENTATIONS OR WARRANTIES OF ANY KIND WITH RESPECT TO THE INFORMATION IN THIS PUBLICATION, AND SPECIFICALLY DISCLAIMS IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

Use, copying and distribution of any Dell EMC software described in this publication requires an applicable software license.

Dell, EMC and other trademarks are trademarks of Dell Inc. or its subsidiaries.

2016 EMC Proven Professional Knowledge Sharing 28