Recommended Solutions for Emc Vplex Metro with Vblock™ Infrastructure Platforms

VCE Word Template Table of Contents www.vce.com

RECOMMENDED SOLUTIONS FOR EMC VPLEX METRO WITH VBLOCK™ INFRASTRUCTURE PLATFORMS

April 2012

Contents

Introduction ...... 4 Business case...... 4 Solution ...... 5 Benefits ...... 5 Scope ...... 6 Audience...... 6 Feedback ...... 6 Terminology ...... 6

Technology overview ...... 8 VCE Vblock™ Infrastructure Platforms ...... 8 VMware vCenter Server ...... 8 VMware vCenter Server Heartbeat ...... 9 VMware vSphere High Availability ...... 9 VMware vMotion ...... 9 EMC VPLEX ...... 10 VPLEX Local ...... 10 VPLEX Metro ...... 10 VPLEX Witness ...... 10 Use Cases ...... 11 Business continuity ...... 11 Workload and data mobility ...... 11

Deployment guidelines and best practices ...... 12 Planning the deployment ...... 12 VPLEX to Vblock platform mappings ...... 13 VPLEX port requirements per connected Vblock platform ...... 13 Virtual networking high availability ...... 14 Using host affinity groups ...... 14 Deploying VPLEX Witness ...... 15 Monitoring VPLEX ...... 15 Guidelines for VPLEX coexistence with UIM/P...... 16 Oracle Real Application Clusters ...... 16 Unsupported configurations ...... 17

Recommended configurations ...... 18 Non Cross-Cluster Connect configurations ...... 18 Non Cross-Cluster Connect without VPLEX Witness ...... 19 Non Cross-Cluster Connect with VPLEX Witness ...... 20 Cross-Cluster Connect configuration ...... 21 Data flow...... 23

Failure scenarios ...... 24 Application and management failover ...... 24 Application and management failback ...... 24 Failure scenarios ...... 25 Non Cross-Cluster Connect failure scenarios ...... 25 Cross-Cluster Connect failure scenarios ...... 28 Conclusion ...... 32 Next steps ...... 32

Additional references ...... 33 VCE ...... 33 VMware ...... 33 EMC ...... 33

Introduction

Today, more and more enterprises are virtualizing their business-critical applications to deliver the most value back to their businesses. In a virtualized environment, where physical servers host many virtual servers, the volume of data and the speed of change require new techniques and methods for:

. Protecting essential data to ensure business continuity . Moving and relocating applications and data to accommodate dynamic workloads

This paper contains information to facilitate a conversation among customers, VCE vArchitects, and EMC vArchitects about the options for deploying a business continuity and workload mobility solution for Vblock™ Infrastructure Platforms with EMC VPLEX Metro.

Business case

Data protection for business continuity and the ability to migrate applications and their data are key IT and business objectives.

Downtime of important applications is a costly proposition and extended downtime can be disastrous to the business. However, challenges such as high complexity, high costs, and unreliable solutions have limited the ability of organizations to implement effective business continuity plans.

Equally challenging is finding an agile, non-disruptive method to move applications and their data within and between data centers to balance workloads, do system maintenance, and consolidate resources. Traditionally, organizations had to perform a series of manual tasks and activities to transfer applications and data to an alternate location. IT staff would either make physical backups or use data replication services. Applications had to be stopped and could not be re-started until testing and verification were complete.

To ensure the integrity, availability, and currency of the data and applications running on the Vblock platforms in their data centers, VCE customers need a business continuity and workload mobility solution that:

. Provides non-disruptive, transparent data mobility between Vblock platforms over distance . Simplifies heterogeneous and multi-array storage management and operations, including faster provisioning, improved utilization, and ongoing refreshes . Allows data volumes to be configured for simultaneous access by applications in two locations, enabling relocation, sharing, and balancing of infrastructure resources . Transparently moves and relocates active virtual machines for more dynamic IT operations . Provides high availability in the case of planned and unplanned events

Solution

To meet the business challenges presented by today’s on-demand 24x7 world, virtual workloads must be highly available and mobile—in the right place, at the right time, and at the right cost to the enterprise. EMC VPLEX Metro, working in conjunction with VMware vMotion, is a hardware and software solution for Vblock platforms that provides enhanced availability for business continuity and dynamic workload mobility.

VPLEX Metro allows data to be distributed and shared across sites. Multiple users can access a single copy of data from two locations, allowing instant access to information in real time. This eliminates the operational overhead and time used to copy and to distribute data across locations. It also increases availability and resiliency by allowing volumes to be mirrored within and across locations. This provides nonstop application availability in the event of a component failure.

VPLEX Metro solves many of the business continuity and workload mobility challenges facing enterprises today. With VPLEX Metro, organizations can:

. Transparently move and relocate VMware ESXi virtual machines with their corresponding applications and data over synchronous distances between Vblock platforms . Increase workload resiliency through automatic component failover with the VPLEX N+1 clustering architecture . Manage and move heterogeneous block storage data non-disruptively over synchronous distances from a single interface . Start small with a single VPLEX engine and grow the cluster by adding more engines

VMware vMotion leverages the virtualized converged infrastructure of the Vblock platform to move an entire running virtual machine instantaneously from one server to another. VMware Dynamic Resource Scheduler (DRS) uses vMotion to continuously monitor utilization across resource pools and intelligently align resources with business needs.

Benefits

Deploying VPLEX Metro with the Vblock platform provides many benefits including:

. Distributed storage federation—Achieve transparent mobility and access within, between, and across data centers . EMC AccessAnywhere—Share, access, and relocate a single copy of data over distance . Scale-out cluster architecture—Start small and grow larger with predictable service levels . Advanced data caching—Improve I/O performance and reduce storage array contention . Distributed cache coherence—Automate sharing, balancing, and failover of I/O across clusters . Mobility—Migrate and relocate virtual machines, applications, and data . Resilience—Reduce unplanned application outages between sites

Scope

To help customers choose the VPLEX Metro configuration most suitable for their specific business objectives and environment, this paper presents:

. Descriptions of the key use cases for deploying VPLEX Metro with the Vblock platform . VPLEX Metro deployment options for business continuity and data mobility . Guidelines and best practices for deploying VPLEX Metro with the Vblock platform

Detailed instructions for installing and configuring VPLEX Metro with the Vblock platform are not included. Refer to the Additional References section for a list of the appropriate installation and administration guides.

Audience

This paper will be of particular interest to system, application, database, and storage architects; VCE and EMC vArchitects; and anyone interested in deploying a VPLEX Metro solution for Vblock platforms.

Feedback

To suggest documentation changes and provide feedback on this paper, send email to [email protected]. Include the title of this paper and the name of the topic to which your feedback applies.

Terminology

Term Definition Block storage Data structured as blocks. A block is a sequence of bytes or bits having a nominal length (block size). The process of putting data into blocks is called blocking. Blocking is used to facilitate the handling of the data stream by the computer program receiving the data. Blocked data are normally read a whole block at a time. Virtual volumes in VPLEX are presented to users as a contiguous list of blocks.

Business continuity The continuity of essential business functions during and after a disaster. Business continuity planning develops processes and procedures to prevent interruption of mission-critical services, and to reestablish full functioning as swiftly and smoothly as possible.

Distributed virtual volume A VPLEX virtual volume with complete, synchronized copies of data (mirrors), exposed through two geographically separated VPLEX clusters. Servers at distant data centers can simultaneously access distributed virtual volumes thus allowing vMotion over distance.

EMC AccessAnywhere The enabling technology that underlies the ability of VPLEX to provide access to information between clusters separated by distance.

Term Definition GeoSynchrony The operating system running on VPLEX directors. GeoSynchrony is an intelligent, multitasking, locality-aware operating environment that controls the data flow for virtual storage.

High availability A system-design approach and associated service implementation that ensures a pre-arranged level of operational performance will be met during a contractual measurement period.

Latency An amount of elapsed time. In this document, latency may refer to the time required to fulfill an I/O request or to the round-trip time (RTT) required to send a message over a network and back.

Recovery Point Objective The maximum amount of data that can be lost in a given failure event. (RPO)

Recovery Time Objective Duration of time within which a business process must be restored after (RTO) a disaster (or disruption) in order to avoid unacceptable consequences associated with a break in business continuity.

Virtual volume The topmost device within the VPLEX I/O stack that can be presented to a host or multiple hosts.

VPLEX cluster Two or more VPLEX directors forming a single fault-tolerant cluster.

VPLEX director A CPU module that runs GeoSynchrony, the core VPLEX operating environment. There are two directors in each engine, and each has dedicated resources and is capable of functioning independently.

VPLEX engine A VPLEX enclosure that contains two directors, management modules, and redundant power to ensure high availability (HA) and no single point of failure.

Workload In the context of this document, virtual machines and their corresponding applications and storage.

Workload federation Dynamically distribution or balancing the workloads as effectively as possible regardless of physical location while optimizing business and operational goals.

Technology overview

The VPLEX solution for Vblock platforms uses the following key hardware and software components and technologies:

. VCE Vblock Infrastructure Platforms . VMware vCenter Server . VMware vCenter Server Heartbeat . VMware vSphere High Availability . VMware vMotion . EMC VPLEX

VCE Vblock™ Infrastructure Platforms

Vblock platforms combine industry-leading compute, network, storage, virtualization, and management technologies into prepackaged units of infrastructure. Through standardization of building blocks, the Vblock platform dramatically simplifies IT operations—accelerating deployment while reducing costs and improving service levels for all workloads, including the most demanding and mission-critical enterprise applications.

Vblock platforms scale to deliver the right performance and capacity to match the needs of business applications. The following Vblock platforms are available:

• Vblock Series 300 is designed to address a wide spectrum of virtual machines, users, and applications and is ideally suited to achieve the scale required in both private and public cloud environments. Vblock 300 scales from smaller- to mid-sized enterprise Customer Relationship Management (CRM), Supply Chain Management (SCM), e-mail, file and print, and collaboration deployments. . Vblock Series 700 is designed for deployments involving very large numbers of virtual machines and users and is ideally suited to meet the higher performance and availability requirements of critical business applications. Vblock 700 scales to the largest deployments of enterprise CRM and SCM, data center operations, and service provider cloud computing offerings.

For more information on Vblock platforms, refer to the Vblock™ Infrastructure Platforms Technical Overview.

VMware vCenter Server

VMware vCenter Server provides a scalable and extensible platform that forms the foundation for virtualization management. It centrally manages VMware vSphere environments allowing IT administrators control over the virtual environment. One vCenter Server controls both the primary and secondary Vblock platforms in the VPLEX Metro deployment. Each Vblock platform contains a copy of vCenter Server in case one fails. These vCenter Server instances must be identical.

For more information on VMware vCenter Server, go to the VMware vCenter Server web page.

VMware vCenter Server Heartbeat

VMware vCenter Server Heartbeat delivers high availability for vCenter Server, protecting the virtual and cloud infrastructure from application, configuration, operating system, network, and hardware- related problems.

Heartbeat is a clustering solution with a primary node and a secondary node operating in active-passive mode. Heartbeat keeps the vCenter Server instances at each site synchronized. Changes to the vCenter Server configuration at one site are reflected to the other site.

For more information on vCenter Heartbeat, refer to the VMware vCenter Server Heartbeat Administrator Guide.

Note: In a VPLEX Metro deployment, a VMware vCenter Heartbeat license is required for each instance of vCenter Server being protected with Heartbeat.

VMware vSphere High Availability

VMware vSphere High Availability (HA) is an easy-to-use, cost-effective feature for ensuring continuous operation of applications running on virtual machines. HA continuously monitors all virtualized servers in a resource pool and detects failures in the physical server and operating system. In the event of physical server failure, affected virtual machines are automatically restarted on other production servers with spare capacity. In the case of operating system failure, HA restarts the affected virtual machine on the same physical server. When combined with VPLEX distributed storage, HA also provides fully automatic recovery from a complete site disaster.

VMware vMotion

Included with VMware vSphere, VMware vMotion enables live migration of virtual machines from one physical server to another while continuously powered-up. This process takes place with no noticeable effect from the point of view of the end user. An administrator can take a virtual machine offline for maintenance or upgrading without subjecting the system's users to downtime. Migration of a virtual machine with VMware vMotion preserves its precise execution state, network identity, and the active network connections. As a result, there is zero downtime and no disruption to the user.

Combined with Vblock platforms and VPLEX, vMotion enables effective distribution of applications and their data across multiple virtual hosts within synchronous distances. With virtual storage and virtual servers working together over distance, the infrastructure can provide load balancing, realtime remote data access, and improved application protection.

VMware vMotion is the key technology that underpins VMware Distributed Resource Scheduler (DRS). DRS continuously monitors the pooled resources of many servers and intelligently allocates available resources among virtual machines based on pre-defined rules that reflect business needs and priorities. The result is a self-managing, highly optimized, and efficient IT environment with built-in, automated load balancing.

For more information on VMware vMotion, refer to Workload Mobility with VMware vMotion and EMC VPLEX on Vblock Platforms.

EMC VPLEX

VPLEX is an enterprise-class, storage federation technology that aggregates and manages pools of Fibre Channel (FC) attached storage within and among data centers. VPLEX resides between the servers and the FC-attached storage, and presents local and distributed volumes to hosts.

VPLEX enables dynamic workload mobility and continuous availability within and between Vblock platforms over distance. It provides simultaneous access to storage devices at two sites through creation of VPLEX distributed virtual volumes, supported on each side by a VPLEX cluster.

For more information on VPLEX, refer to the EMC VPLEX 5.0 Architecture Guide.

VPLEX Local

VPLEX Local provides seamless, non-disruptive data mobility and the ability to manage multiple heterogeneous arrays from a single interface within a data center. VPLEX Local allows increased availability, simplified management, and improved utilization across multiple arrays.

VPLEX Metro

VPLEX Metro with EMC AccessAnywhere delivers distributed federation and enables active/active block-level access to data between two sites within synchronous distances of up to a 5-millisecond round-trip time. VPLEX Metro, in combination with VMware vMotion, allows transparent movement and relocation of virtual machines and their corresponding applications and data over distance. AccessAnywhere enables a single copy of data to be shared, accessed, and relocated over distance.

VPLEX Witness

VPLEX Witness is an optional component designed for deployment in customer environments where the regular bias rule sets are insufficient to provide seamless zero or near-zero Recovery Time Objective (RTO) failover in the event of site disasters and VPLEX cluster failures.

By reconciling its own observations with the information reported periodically by the clusters, VPLEX Witness enables the cluster(s) to distinguish between inter-cluster network partition failures and cluster failures and to resume I/O automatically in these situations.

For more information, refer to the EMC VPLEX Metro Witness Technology and High Availability TechBook.

Use Cases

Organizations rely on the continuity of their data centers as an essential part of their business. In addition, as data centers become more geographically dispersed, IT organizations need to be able to dynamically and non-disruptively move workloads from one physical location to another.

Business continuity

Business continuity describes the processes and procedures an organization puts in place to ensure that essential functions can continue during and after a disaster. Business continuity planning seeks to prevent interruption of mission-critical services, and to reestablish full functioning as swiftly and smoothly as possible using an automated process with zero data loss and near-zero recovery time.

VPLEX Metro, in combination with Vblock platforms, facilitates business continuity through creation of distributed virtual volumes, which are storage volumes located in two separate Vblock platforms. These virtual volumes are 100% in sync at all times. The VPLEX Metro solution provides the dynamic storage infrastructure required to migrate applications, virtual machines, and data within and between remote locations with no disruption of service.

VPLEX Metro provides data access from both the primary and target sites in an active-active mode, eliminating the need to move the underlying storage and making migration dramatically faster. AccessAnywhere, the distributed cache coherent technology in VPLEX Metro, enables simultaneous read/write data access.

Properly configured, VPLEX Metro delivers a zero Recovery Point Objective (RPO) and a near-zero Recovery Time Objective (RTO).

Workload and data mobility

The combination of the converged infrastructure of the Vblock platform, VMware vMotion, and VPLEX Metro allows administrators to relocate virtual machines and their corresponding applications and storage. Data centers can now pool capacity to improve infrastructure utilization, refresh technology, or load balance within or across data centers as well as enhance Service Level Agreements (SLAs) by providing high availability, increased resiliency, and business continuity for critical applications and data. Workload and data mobility with VPLEX Metro can be automatic using DRS, or manual using vMotion.

For more information about workload and data mobility, refer to Workload Mobility with VMware vMotion and EMC VPLEX Metro on Vblock™ Infrastructure Platforms.

Deployment guidelines and best practices

The amount of data and the data change rate of the storage volumes that require business continuity protection determine the specific VPLEX Metro configuration required. The following sections provide deployment best practices and guidelines to help choose the best configuration for the environment and business needs.

Planning the deployment

Deployment planning is a critical step in the successful implementation of the VPLEX Metro with Vblock platforms solution. Each enterprise has its own set of goals, requirements, and priorities to consider.

Table 1 lists deployment prerequisites and guidelines to consider before beginning deployment.

Table 1. Deployment prerequisites and guidelines

Item Activity/Guideline Applications Determine: . Which applications, and therefore, which storage volumes VPLEX Metro will manage. . Data change rates. . The business’s application priorities. In the event of a site disaster, all failed applications need to be restarted on the other site. Define the priority of applications, that is, which ones are the most critical. Use VMware Startup Priority to prioritize and get the most critical applications back online first.

Recovery plan Develop and test a recovery plan.

Bandwidth Ensure the following: . Layer-2 network extends across the sites. . Adequate bandwidth between sites for the data change rate. . If using vMotion, allow for 622 Mb/sec (5-millisecond maximum round-trip time) of bandwidth above VPLEX bandwidth requirement. Disk space For a business continuity deployment with near-zero RTO, protected data and applications reside on both sites through distributed virtual volumes. Allow for adequate disk space on both sites for this duplication.

VMware vCenter instance VMware vCenter Heartbeat synchronizes changes made on either site to consistency the other. When using Heartbeat, ensure that the vCenter Server instances on the Vblock platforms at both sites are identical and not installed on a distributed device. Note: In a VPLEX Metro deployment, a VMware vCenter Heartbeat license is required for each instance of vCenter Server being protected with Heartbeat.

Item Activity/Guideline Site hardware/software Protected applications can be distributed across the two sites. The resources workload split need not be 50/50. However, for business continuity to work, each site must have the ability to run 100 percent of the combined workload from both sites. . Site A must have enough resources to run 100 percent of the VPLEX protected applications. . Site B must have enough resources to run 100 percent of the VPLEX protected applications.

VPLEX to Vblock platform mappings

Table 2 maps Vblock platform models to VPLEX cluster configurations. In this table, it is assumed that only one Vblock platform is connected to one VPLEX cluster. For example, a single Vblock 700LX can be connected to a single-engine or a dual-engine VPLEX cluster, but not to a quad-engine VPLEX cluster.

Table 2. VPLEX to Vblock Platform Mappings

Vblock VPLEX Cluster Configurations Platform Single-Engine Dual-Engine Quad-Engine 300 series Yes Yes No1

700LX Yes Yes No1

700MX Yes Yes Yes

1 The Vblock Series 300 and Vblock 700LX have only 16 ports available for VPLEX front-end and back- end connections.

VPLEX port requirements per connected Vblock platform

The following table shows the range of FC connections for each connected Vblock platform. The specific number of connections required is determined during the VPLEX sizing effort.

Two separate physical WAN links should be used to connect the two VPLEX clusters.

Note: One or two Vblock platforms can be connected to each VPLEX cluster. If the Vblock platforms are Cross-Cluster Connected, then only one Vblock platform can be connected to the VPLEX cluster.

Table 3. VPLEX Port Requirements per Connected Vblock Platform

FC Connections One Engine Two Engines Four Engines Comments 2 Directors 4 Directors 8 Directors VPLEX front-end 4 (Non Cross- 8 (Non Cross- 16 (Non Cross- Read/write I/O (UCS-facing Cluster Connect) Cluster Connect) Cluster Connect) from VPLEX ports) 8 (Cross-Cluster 16 (Cross-Cluster 32 (Cross-Cluster protected Connect) Connect) Connect) applications

VPLEX back-end 4 8 16 Application (Storage-facing read/writes and ports) changes mirrored from site A to site B

4 GB or 8 GB 4 8 16 Sizing based on WAN ports performance for VPLEX requirements communications traffic (10 GbE can also be used)

Virtual networking high availability

A Cisco Nexus 1000V distributed virtual switch manages virtual networking for the Vblock platform. It provides a common management model for both physical and virtual network infrastructures that includes policy-based virtual machine connectivity, mobility of virtual machine security and network properties, and a non-disruptive operational model.

For VPLEX Metro cluster configurations, VCE recommends that the Virtual Supervisor Modules (VSMs) of the Nexus 1000V are moved from the Advanced Management Pod (AMP) to the UCS blades in the Vblock platform. As a result, VPLEX virtual volumes protect both active and standby VSMs and ESXi cluster HA restarts the VSMs automatically in the event of a disaster.

Using host affinity groups

Each distributed virtual volume has a preferred VPLEX cluster based on the detach rule configured for it. Under normal operation conditions, virtual volumes are available at both VPLEX clusters. However, in many failure cases, they are available only at the preferred VPLEX cluster.

Therefore, it is recommended that applications run at the VPLEX cluster preferred by the virtual volumes the application is using in the event of any scenario that invokes the VPLEX preference rule (such as a WAN partition). At the same time, the flexibility to have virtual machines move to the non- preferred site in case of failures or load spikes is desirable.

VMware DRS host affinity rules can be used to ensure that virtual machines are always running in their preferred location—the location that the storage they rely on is biased toward.

For example, hosts and virtual machines might be organized into groups A and B. VM group A is configured to run on host group A whenever possible. Host group A contains the UCS blades in one Vblock platform (Vblock1) and Host group B contains the UCS blades in the other Vblock platform (Vblock2).

Any virtual machines relying on datastores for which the underlying virtual volume is preferred in Vblock1 is put in VM group A. Any virtual machines relying on datastores that have Vblock2 as preferred is put in VM group B. The host affinity rule can then specify that whenever possible, VM group A should run on host group A and VM group B should run on host group B.

In this way, the virtual machines stay in the location where they have the highest possible availability, but maintain the ability to move to the other location if the preferred location is unable to host them.

For more information, refer to the EMC VPLEX Metro Witness Technology and High Availability TechBook.

Deploying VPLEX Witness

An external VPLEX Witness server is installed as a virtual machine running on a customer-supplied VMware ESXi host deployed in a failure domain separate from either of the VPLEX clusters. VPLEX Witness connects to both VPLEX clusters over the IP management network.

If two Vblock platforms are connected to each VPLEX cluster in a VPLEX Metro deployment, a separate VPLEX Witness is required for each Vblock platform-VPLEX cluster pairing.

For more information, refer to the EMC VPLEX Metro Witness Technology and High Availability TechBook.

Monitoring VPLEX

GeoSynchrony supports multiple methods of monitoring the VPLEX cluster including:

. Health and environmental monitoring of the VPLEX cluster hardware with a GeoSynchrony service. It monitors various power and environmental conditions at regular intervals and logs any condition changes into the VPLEX messaging system. Any condition that indicates a hardware or power fault generates a call home event to notify the administrator. . Daily health monitoring with the cluster-status and health-check commands. . System configuration and general health monitoring with summary and reporting commands.

Events are logged to the firmware.log files on the VPLEX management server. Any critical errors generate a call home event to EMC.

Refer to the EMC VPLEX CLI Guide (P/N 300-012-311-A02) for more information about monitoring VPLEX.

Guidelines for VPLEX coexistence with UIM/P

EMC Ionix Unified Infrastructure Manager/Provisioning (UIM/P) provides simplified management for Vblock platforms, including provisioning, configuration, change, and compliance management. UIM/P offers a consolidated dashboard view, policy-based management, automated deployment, and deep visibility across the environment.

At this time, UIM/P and VPLEX are not integrated, and so VPLEX cannot be used on a storage volume (or storage pool) allocated by UIM/P. The following guidelines apply to environments in which UIM/P is being used:

. In deployments that will include VPLEX, do not allocate 100% of the storage for the Vblock platform with UIM/P. Reserve an adequate amount of storage for VPLEX virtual volumes. The volumes can be re-allocated with UIM/P later if they are no longer needed for VPLEX. . If 100% of the storage for the Vblock platform is already allocated with UIM/P, the Vblock platform must have space for additional storage, and the added storage cannot be allocated with UIM/P. . UIM/P displays alarms when it sees storage that it has not allocated and zone members that it has not defined. These alarms should be ignored. Trying to fix them may cause an operational error. This only happens in Cross-Cluster Connect deployments. . VPLEX storage should come from a separate, ungraded storage pool or disk group. If UIM/P has not graded the storage, it does not try to manage the storage and does not set alarms when storage is allocated from it. . If using VPLEX with multiple Vblock platforms, each Vblock platform must have its own UIM/P to be managed separately even if there is only one vCenter Server. . There is no support for array migration under VPLEX (since service boot disks would come directly from the array, not flow through VPLEX)

Oracle Real Application Clusters

EMC AccessAnywhere clustering technology allows read/write access to distributed volumes across distance where the volumes have the exact same SCSI LUN identity. This technology allows hypervisors to migrate virtual machines across distance and application clusters such as Oracle Real Application Clusters (RAC) to provide high availability across distance.

For more information about VPLEX Metro working in conjunction with Oracle RAC on the Vblock platform, refer to Oracle Extended RAC With EMC VPLEX Metro Best Practices Planning.

Unsupported configurations

VCE does not support: VPLEX configurations with the following characteristics:

. Servers or storage allocated by UIM/P . Merging of a Vblock platform SAN fabric with any other SAN fabric . More than one Vblock platform connected to a single VPLEX instance

Recommended configurations

All of the following configuration examples assume the following:

. The term site refers to a location. Sites can be rooms or floors in a building, buildings in a campus environment, or data centers separated by distance. . Both sites have access to 100% of the applications and data protected by VPLEX Metro through distributed virtual volumes. . Site A is running 50% of the VPLEX Metro protected applications, but must have enough VMware resources to run 100% of the applications. . Site B is running 50% of the VPLEX Metro protected applications, but must have enough VMware resources to run 100% of the applications. . In the case of a site failure, there will be zero data loss, but there will be a +/-1 minute delay while the virtual guests start up on the other site.

Non Cross-Cluster Connect configurations

A Non Cross-Cluster Connect configuration:

. Delivers zero RPO in all single failures including that of an entire site . Delivers zero RTO in the event of a storage array failure . For a server domain failure or VPLEX failure, limits RTO to the application restart time . Requires a ping round-trip time of less than 5 milliseconds

Non Cross-Cluster Connect without VPLEX Witness

This example shows a Non Cross-Cluster Connect configuration without VPLEX Witness. In this configuration, recovery processes will need to be started manually.

Non Cross-Cluster Connect with VPLEX Witness

This example shows a Non Cross-Cluster Connect configuration where VPLEX Witness resides on an ESXI host at a third site (Site 3). In this configuration, if either Site 1 or Site 2 fails, VPLEX Witness will be able to automatically start recovery processes. The third site (Site 3) must be a location outside of the failure domain. For example, if the objective is to protect against a fire, VPLEX Witness needs to be outside the fire zone. If the objective is to protect against earthquake, VPLEX Witness must be outside the earthquake zone.

Cross-Cluster Connect configuration

Servers in a Cross-Cluster Connect configuration can access data from the VPLEX at either site. Cross-Cluster Connect is suitable for deploying within a campus, or in multiple isolated zones within a single data center. This configuration eliminates the need to do server failover when an entire storage cluster (VPLEX or the storage behind it) goes down.

A Cross-Cluster Connect configuration:

. Delivers zero RPO in all single failures including that of an entire site . Delivers zero RTO in the event of a storage domain failure . For a server domain failure, limits RTO to the application restart time . Requires a ping round-trip time of less than 1 millisecond

Refer to the EMC VPLEX Metro Witness Technology and High Availability TechBook for detailed descriptions of these failure scenarios.

Data flow

The following diagram illustrates the logical flow of data between the two sites.

Failure scenarios

The following sections describe failure scenarios with their associated VPLEX behavior and recovery procedures.

Application and management failover

Distributed virtual volumes managed by VPLEX at both sites are 100% in sync at all times. With VPLEX Witness, in the event of the failure of one site, the data is immediately available at the alternate site in a crash consistent state. No scripting, failover declaration, or action is required.

If the servers at the failed site have gone down, then VMware HA restarts the affected virtual machines at the other site automatically—similar to the failure of a single or set of servers in a single site.

VMware vCenter Server Heartbeat ensures the ability to fail over vCenter management if the primary vCenter instance has gone down. Heartbeat monitors the availability of all components of vCenter Server at the application and service layer, with the ability to restart or restore individual services. It uses a passive server instance to provide rapid failover and failback of vCenter Server and its components.

Refer to the VMware vCenter Server Heartbeat Administrator Guide for detailed information about configuring Heartbeat for failover and recovering from a failover.

The following factors determine how long it takes an application to become operational after failover:

. Number of virtual machines being restarted . Location of the application in the priority sequence . Any application-dependent tasks that must be performed to get the application restarted from a crash-consistent state (for example, reestablish credentials, file-system check, database log rollback)

Application and management failback

After the failed site is restored, the applications that failed over must be failed back. VPLEX resynchronizes the distributed virtual volumes and makes them available at the previously failed site automatically. If VMware DRS is in use, it moves virtual machines back to the previously failed site automatically according to the policies with which it has been configured. It is also possible to move virtual machines back manually with a vMotion operation.

Depending on the configuration, some manual intervention may be required to make the vCenter Server primary again for applications on the failed site. Refer to the VMware vCenter Server Heartbeat Administrator Guide for detailed information about making the primary server active again.

Failure scenarios

The following sections provide a comprehensive list of failure scenarios for Non Cross-Cluster Connect and Cross-Cluster Connect configurations. Each scenario includes the associated VPLEX behavior and VMware HA recovery procedures.

The deployment for these scenarios consists of a VMware HA/DRS cluster across two sites using ESXi 5.0 hosts. vCenter Server 5.0 manages the cluster and connects to the ESXi hosts at both sites. The vSphere management, vMotion management, and virtual machine networks are connected using a redundant network between the two sites.

A VPLEX Metro solution federated across the two sites provides the distributed storage to the ESXi hosts. The SAN Boot LUN is on the back-end storage array, and not on the distributed virtual volume itself. The virtual machine runs on the preferred site of the distributed virtual volume.

Refer to the EMC VPLEX Metro Witness Technology and High Availability TechBook for detailed descriptions of these failure scenarios.

Go to the VMware web site at http://www.vmware.com/support/ for the most up-to-date technical documentation.

Non Cross-Cluster Connect failure scenarios

Non Cross-Cluster Connect VPLEX Behavior Impact/Observed VMware HA Scenario Behavior Single VPLEX back-end (BE) VPLEX continues to operate None path failure using an alternate path to the same BE array. Distributed virtual volumes exposed to the ESXi hosts have no impact.

Single front-end (FE) path The ESXi server is expected to None failure use alternate paths to the distributed virtual volumes.

BE array failure at site A VPLEX continues to operate None using the array at site B. When the array is recovered from the failure, the storage volume at site A is resynchronized from site B automatically.

BE array failure at site B VPLEX continues to operate None using the array at site A. When the array is recovered from the failure, the storage volume at site B is resynchronized from site A automatically.

VPLEX director failure VPLEX continues to provide None access to the distributed virtual volume through other directors on the same VPLEX cluster.

Non Cross-Cluster Connect VPLEX Behavior Impact/Observed VMware HA Scenario Behavior Complete site A failure VPLEX continues to serve I/O on Virtual machines running at the The failure includes all ESXi the surviving site (site B). When failed site fail. HA automatically hosts and the VPLEX cluster at the VPLEX at the failed site (site restarts them on the surviving site A. A) is restored, the distributed site. virtual volumes are synchronized automatically from the active site (site B).

Complete site B failure VPLEX continues to serve I/O on Virtual machines running at the The failure includes all ESXi the surviving site (site A). When failed fail. HA automatically hosts and the VPLEX cluster at the VPLEX at site B is restored, restarts them on the surviving site A. the distributed virtual volumes site. are synchronized automatically from the active site (site A).

Multiple ESXi host None VMware HA restarts the virtual failure(s) – Power off machines on any of the surviving ESXi hosts within the HA cluster.

Multiple ESXi host None VMware HA continues to failure(s) – Network disconnect exchange cluster heartbeat through the shared datastore. No virtual machine failovers occur.

ESXi host experiences APD (All None In an APD (All Paths Down) Paths Down) scenario, the ESXi host must be Encountered when the ESXi restarted to recover. If the ESXi host loses access to its storage host is restarted, HA restarts the volumes (in this case, VPLEX failed virtual machines on other volumes). surviving ESXi hosts within the HA cluster.

VPLEX inter-site link (ISL) VPLEX transitions distributed Virtual machines running in the failure; vSphere cluster virtual volumes on the non- preferred site are not affected. management network intact preferred site to the I/O failure state. On the preferred site, the Virtual machines running in the distributed virtual non-preferred site experience I/O volumes continue to provide failure and fail. HA fails over access. these virtual machines on the other site. Best practice is to run the virtual machines on the preferred site.

VPLEX cluster failure The I/O continues to be served The ESXi hosts located at the The VPLEX at either site A or on all the volumes on the failed site experience an APD site B has failed, but ESXi and surviving site. condition. The ESXi hosts must other LAN/WAN/SAN be restarted to recover from the components are intact. failure.

Non Cross-Cluster Connect VPLEX Behavior Impact/Observed VMware HA Scenario Behavior Complete dual site failure Upon restoration of the two sites, The ESXi hosts should be the VPLEX continues to serve brought up only after VPLEX is I/O. Best practice is to bring up fully recovered and the the BE storage arrays first, distributed virtual volumes are followed by VPLEX. synchronized. When the ESXi hosts at each site are powered on, the virtual machines are restarted and resume normal operations.

Director failure at one site The surviving VPLEX directors None (preferred site for a given within the VPLEX cluster with the distributed virtual volume) and failed director continue to provide BE array failure at the other site access to the distributed virtual (secondary site for a given volumes. distributed virtual volume) VPLEX continues to provide access to the distributed virtual volumes using the preferred site BE array.

VPLEX ISL intact; vSphere None Virtual machines on each site cluster management network continue running on their failure respective hosts since the HA cluster heartbeats are exchanged through the shared datastore.

VPLEX ISL failure; vSphere VPLEX fails I/O on the non- For virtual machines running in cluster management network preferred site for a given the preferred site, powered-on failure simultaneously distributed virtual volume. The virtual machines continue to run. volumes continue to have access This is an HA split brain on the distributed virtual volume situation. The non-preferred site on the preferred site. thinks that the hosts of the preferred site are dead and tries to restart the powered-on virtual machines of the preferred site. For virtual machines running in the non-preferred site, these virtual machines see their I/O as failed and the virtual machines fail. These virtual machines can be registered and restarted on the preferred site.

VPLEX storage volume is VPLEX continues to serve I/O on If the I/O is running on the lost unavailable (for example, it is the other site where the volume device, ESXi detects a PDL accidentally removed from the is available. (Permanent Device Loss) storage view or the ESXi condition. The virtual machine initiators are accidentally is killed by VM Monitor and removed from the storage view) restarted by HA on the other site.

VPLEX inter-site WAN link The VPLEX fails I/O on the The virtual machines at site B failure and simultaneous distributed virtual volumes at site fail. They can be restarted at site VPLEX Witness to site B link B and continues to serve I/O on A. failure site A. There is no impact on the virtual machines running at site A.

Non Cross-Cluster Connect VPLEX Behavior Impact/Observed VMware HA Scenario Behavior VPLEX inter-site WAN link The VPLEX fails I/O on the It has been observed that the failure and simultaneous distributed virtual volumes at site virtual machines at site A fail. VPLEX Witness to site A link A and continues to serve I/O on They can be restarted at site B. site B. failure There is no impact on the virtual machines running at site B.

VPLEX Witness failure VPLEX continues to serve I/O at None both sites.

VPLEX Management Server None None failure

vCenter Server failure None No impact on the running virtual machines or HA. However, the DRS rules and virtual machine placements are not in effect.

Cross-Cluster Connect failure scenarios

Cross-Cluster Connect VPLEX Behavior Impact/Observed VMware HA Scenario Behavior Single VPLEX back-end (BE) VPLEX continues to operate None path failure using an alternate path to the same BE Array. Distributed virtual volumes exposed to the ESXi hosts have no impact.

Single front-end (FE) path The ESXi server is expected to None failure use alternate paths to the distributed virtual volumes.

BE Array failure at site A VPLEX continues to operate None using the array at site B. When the array is recovered from the failure, the storage volume at site A is resynchronized from site B automatically.

VPLEX director failure VPLEX continues to provide None access to the distributed virtual volume through other directors on the same VPLEX cluster.

Complete site A failure VPLEX continues to serve I/O on Virtual machines running at the The failure includes all ESXi the surviving site (site B). When failed site fail. HA automatically hosts and the VPLEX cluster at the VPLEX at the failed site (site restarts them on the surviving site A. A) is restored the distributed site. virtual volumes are synchronized automatically from the active site (site B).

Cross-Cluster Connect VPLEX Behavior Impact/Observed VMware HA Scenario Behavior Complete site B failure VPLEX continues to serve I/O on Virtual machines running at the The failure includes all ESXi the surviving site (site A). When failed site fail. HA automatically hosts and the VPLEX cluster at the VPLEX at site B is restored, restarts them on the surviving site A. the distributed virtual volumes are site. synchronized automatically from the active site (site A).

Multiple ESXi host None HA restarts the virtual failure(s) – Power off machines on any of the surviving ESXi hosts within the HA cluster.

Multiple ESXi host None HA continues to exchange cluster failure(s) – Network disconnect heartbeat through the shared datastore. No virtual machine failovers occur.

VPLEX ISL failure; vSphere VPLEX transitions distributed No impact ON the virtual cluster management network virtual volumes on the non- machines since the datastore is intact, the Cross-Cluster preferred site to the I/O failure available to the ESXi hosts Connect SAN ISL intact state. On the preferred site, the through the preferred site. distributed virtual volumes continue to provide access.

VPLEX ISL failure; vSphere VPLEX transitions distributed The virtual machines running on cluster management network virtual volumes on the non- the non-preferred site of the intact; the Cross-Cluster preferred site to the I/O failure distributed virtual volume fail. Connect ISL is also failed state. On the preferred site, the These virtual machines can be distributed virtual manually restarted on an ESXi volumes continue to provide host at the other (preferred) site. access.

VPLEX cluster failure I/O continues to be served on all None of the virtual machines are The VPLEX at either site A or the volumes on the surviving site. affected. All ESXi hosts maintain site B has failed, but ESXi and a connection to the surviving other LAN/WAN/SAN VPLEX cluster and continue to components are intact. have access to all datastores.

Complete dual site failure Upon restoration of the two sites, The ESXi hosts should be VPLEX continues to serve I/O. brought up only after VPLEX is Best practice is to bring up the fully recovered and the distributed BE storage arrays first, followed virtual volumes are synchronized. by VPLEX. When the ESXi hosts at each site are powered on, the virtual machines are restarted and resume normal operations.

Cross-Cluster Connect VPLEX Behavior Impact/Observed VMware HA Scenario Behavior Director failure at one site The surviving VPLEX directors None (preferred site for a given within the VPLEX cluster with the distributed virtual volume) and failed director continue to provide BE array failure at the other site access to the distributed virtual (secondary site for a given volumes. distributed virtual volume) VPLEX continues to provide access to the distributed virtual volumes using the preferred site BE array.

VPLEX ISL failure; vSphere VPLEX fails I/O on the non- There are two possible scenarios: cluster management network preferred site for a given 1 If the ESXi hosts have not lost failure distributed virtual volume. The their cross-site storage volumes continue to have access connection, then none of the on the distributed virtual volume virtual machines are affected on its preferred site. because all ESXi hosts have a path to the preferred site, which remains active following this failure. 2 If, instead, both the VPLEX ISL and the ESXi remote VPLEX links have gone down, then virtual machines running in the non- preferred site see their I/O as failed, and the virtual machines fail. These virtual machines can be registered and restarted on the preferred site. VPLEX storage volume is VPLEX continues to serve I/O on None unavailable (for example, it is the other site where the volume is Each ESXi host maintains access accidentally removed from the available. to the VPLEX virtual volume storage view or the ESXi through the alternate VPLEX initiators are accidentally cluster. removed from the storage view)

Cross-Cluster Connect VPLEX Behavior Impact/Observed VMware HA Scenario Behavior VPLEX inter-site WAN link VPLEX fails I/O on the distributed There are two possible scenarios: failure and simultaneous VPLEX virtual volumes at site B and 1 If the ESXi hosts have not lost Witness to site B link failure continue to serve I/O at site A. their cross-site storage connection, then none of the virtual machines are affected because all ESXi hosts have a path to the preferred site, which remains active following this failure. 2 If, instead, both the VPLEX ISL and the ESXi to remote VPLEX links go down, then virtual machines running at site B go down. These virtual machines can be manually restarted on an ESXi host at site A. Virtual machines running at site A are not affected. VPLEX inter-site WAN link VPLEX fails I/O on the distributed There are two possible scenarios: failure and simultaneous VPLEX virtual volumes at site A and 1 If the ESXi hosts have not lost Witness to site A link failure continues to serve I/O at site B. their cross-site storage connection, then none of the virtual machines are affected because all ESXi hosts have a path to the preferred site, which remains active following this failure. 2 If, instead, both the VPLEX ISL and the ESXi to remote VPLEX link go down, then virtual machines running at site A go down. These virtual machines can be manually restarted on an ESXi host at site B. Virtual machines running at site B are not affected. VPLEX Witness failure VPLEX continues to serve I/O at None both sites.

VPLEX Management Server None None failure

vCenter Server failure None No impact on the running virtual machines or HA. However, the DRS rules and virtual machine placements are not in effect.

Conclusion

High availability and data mobility are key requirements for efficient, cost-effective IT operations in virtualized data centers. EMC VPLEX Metro with VMware vMotion provides a comprehensive solution that fulfills these requirements and ensures business continuity and workload mobility for Vblock platforms.

VPLEX Metro enables transparent load sharing among multiple sites and the flexibility to relocate workloads between sites in anticipation of planned events such as data center relocations and site maintenance.

In the event that a site fails unexpectedly, failed services can be restarted at the surviving site with minimal effort and time to recovery. In a VPLEX Metro with VPLEX Witness configuration, applications continue to operate in the surviving site with no interruption or downtime.

The combination of Vblock platforms, VPLEX Metro, and VMware vMotion provides new ways to solve IT problems, allowing administrators to:

. Move applications and their data between data centers with no disruption . Provide continuous operations during and after site disasters . Balance workloads across Vblock platforms . Collaborate over distance with shared data . Aggregate data centers and provide 24x7 availability

Next steps

To learn more about this and other solutions, contact a VCE representative or go to www.vce.com.

Additional references

Refer to the following documents and web resources for additional information on the topics in this white paper.

VCE

. Vblock™ Infrastructure Platforms Technical Overview . Vblock™ Solution for SAP Mobility . Workload Mobility with VMware vMotion and EMC VPLEX on Vblock™ Platforms . Enhanced Business Continuity with Application Mobility

VMware

. VMware vCenter Server Heartbeat Administrator Guide . VMware vCenter Server web page

EMC

Some EMC technical documentation is available only on the EMC Powerlink website at http://Powerlink.EMC.com. Registration is required.

. EMC VPLEX 5.0 Architecture Guide . EMC VPLEX with GeoSynchrony™ 5.0 and Point Releases Product Guide (P/N 300-012-307- A03) . EMC VPLEX CLI Guide (P/N 300-012-311-A02) . EMC VPLEX Site Preparation Guide (P/N 300-010-495-A05) . EMC VPLEX Metro Witness Technology and High Availability TechBook . Oracle Extended RAC With EMC VPLEX Metro Best Practices Planning

ABOUT VCE VCE, the Virtual Computing Environment Company formed by Cisco and EMC with investments from VMware and Intel, accelerates the adoption of converged infrastructure and cloud-based computing models that dramatically reduce the cost of IT while improving time to market for our customers. VCE, through the Vblock platform, delivers the industry's first completely integrated IT offering with end-to-end vendor accountability. VCE's prepackaged solutions are available through an extensive partner network, and cover horizontal applications, vertical industry offerings, and application development environments, allowing customers to focus on business innovation instead of integrating, validating and managing IT infrastructure. For more information, go to http://www.vce.com.

THE INFORMATION IN THIS PUBLICATION IS PROVIDED "AS IS." VCE MAKES NO REPRESENTATIONS OR WARRANTIES OF ANY KIND WITH RESPECT TO THE INFORMATION IN THIS PUBLICATION, AND SPECIFICALLY DISCLAIMS IMPLIED WARRANTIES OR MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

Copyright © 2012 VCE Company, LLC. All rights reserved. Vblock and the VCE logo are registered trademarks or trademarks of VCE Company, LLC and/or its affiliates in the United States or other countries. All other trademarks used herein are the property of their respective owners.