EMC® VPLEX™ GeoSynchrony® 5.2

Product Guide P/N 302-000-037-01

EMC Corporation Corporate Headquarters: Hopkinton, MA 01748-9103 1-508-435-1000 www.EMC.com Copyright © 2013 EMC Corporation. All rights reserved. EMC believes the information in this publication is accurate as of its publication date. The information is subject to change without notice. THE INFORMATION IN THIS PUBLICATION IS PROVIDED “AS IS.” EMC CORPORATION MAKES NO REPRESENTATIONS OR WARRANTIES OF ANY KIND WITH RESPECT TO THE INFORMATION IN THIS PUBLICATION, AND SPECIFICALLY DISCLAIMS IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Use, copying, and distribution of any EMC software described in this publication requires an applicable software license. For the most up-to-date regulatory document for your product line, go to the Technical Documentation and Advisories section on EMC Online Support®. For the most up-to-date listing of EMC product names, see EMC Corporation Trademarks on EMC.com. All other trademarks used herein are the property of their respective owners.

2 EMC VPLEX Product Guide Contents

Chapter 1 Introducing VPLEX VPLEX overview...... 14 VPLEX product family...... 16 Mobility ...... 19 Availability...... 21 Collaboration...... 22 Architecture highlights...... 23 Features and benefits...... 24 VPLEX Witness...... 25 Non-disruptive upgrade...... 27 New features in this release...... 28

Chapter 2 VS2 Hardware The VPLEX cluster...... 32 The VPLEX engine and directors...... 37 VPLEX power supplies ...... 39 Hardware failure management and best practices ...... 41 Component IP addresses ...... 47

Chapter 3 VPLEX Software GeoSynchrony ...... 50 Management interfaces...... 52 Provisioning with VPLEX...... 55 Consistency groups ...... 61 Cache vaulting...... 65

Chapter 4 Integrity and Resiliency About VPLEX resilience and integrity...... 68 Cluster ...... 69 Path redundancy...... 70 High Availability with VPLEX Witness...... 74 ALUA...... 81 Additional resilience features...... 82 Performance monitoring and security features...... 85 Security features...... 88

EMC VPLEX Product Guide 3 Contents

Chapter 5 VPLEX Use Cases Technology refresh ...... 90 Mobility...... 92 Collaboration ...... 95 VPLEX Metro HA...... 97 Redundancy with RecoverPoint ...... 105

Appendix A VS1 Hardware VS1 cluster configurations ...... 118 VS1 engine...... 121 VS1 IP addresses and component IDs...... 122 VS1 Internal cabling...... 125

4 EMC VPLEX Product Guide Figures

Title Page

1 VPLEX active-active...... 14 2 VPLEX family: Local, Metro, and Geo ...... 16 3 Move data without disrupting service...... 19 4 Data mobility with VPLEX Local, Metro, and Geo ...... 19 5 High availability infrastructure example ...... 21 6 Distributed data collaboration ...... 22 7 Architecture highlights...... 23 8 How VPLEX Witness is deployed ...... 25 9 Management server - rear view ...... 32 10 Fibre Channel switch - rear view...... 33 11 VS2: Single-engine cluster...... 34 12 VS2: Dual-engine cluster...... 35 13 VS2: Quad-engine cluster...... 36 14 Engine, rear view...... 37 15 VPLEX cluster independent power zones...... 39 16 Local mirrored volumes...... 41 17 Component IP addresses in cluster-1...... 47 18 Component IP addresses in cluster-2...... 48 19 Claim storage using the GUI ...... 52 20 VPLEX GUI Provisioning - main page...... 55 21 Extents...... 56 22 Devices...... 57 23 Distributed devices ...... 58 24 Virtual volumes ...... 59 25 Local consistency groups with local visibility ...... 62 26 Local consistency group with global visibility ...... 62 27 Path redundancy: different ports...... 70 28 Path redundancy: different directors ...... 71 29 Recommended fabric assignments for front-end and back-end ports ...... 72 30 Path redundancy: different engines ...... 72 31 Path redundancy: different sites ...... 73 32 High level VPLEX Witness architecture ...... 75 33 Failure scenarios in VPLEX Metro configurations without VPLEX Witness ...... 76 34 VPLEX Witness and VPLEX cluster connection failures...... 78 35 Performance monitoring - dashboard...... 85 36 Performance monitoring - select information to view...... 86 37 Performance monitoring - sample chart ...... 86 38 Traditional view of storage arrays...... 90 39 VPLEX technology refresh...... 91 40 Moving data with VPLEX...... 92

EMC VPLEX Product Guide 5 41 Mobility Central GUI ...... 93 42 Collaborate over distance with AccessAnywhere ...... 95 43 VPLEX Metro HA ...... 97 44 VPLEX Metro HA (no cross-connect) cluster failure...... 98 45 VPLEX Metro HA (no cross-connect) inter-cluster link failure ...... 99 46 VPLEX Metro HA with cross-connect ...... 100 47 VPLEX Metro HA with cross-connect - host failure...... 101 48 VPLEX Metro HA with cross-connect - cluster failure ...... 101 49 VPLEX Metro HA with cross-connect - storage array failure...... 102 50 VPLEX Metro HA with cross-connect - VPLEX Witness failure ...... 102 51 VPLEX Metro HA with cross-connect - inter-cluster link failure...... 103 52 RecoverPoint architecture ...... 105 53 RecoverPoint configurations...... 107 54 VPLEX Local and RecoverPoint CDP...... 109 55 VPLEX Local and RecoverPoint CLR - remote site is independent VPLEX cluster . 110 56 VPLEX Local and RecoverPoint CLR - remote site is array-based splitter ...... 110 57 VPLEX Metro and RecoverPoint CDP...... 110 58 VPLEX Metro and RecoverPoint CLR - remote site is independent VPLEX cluster 111 59 VPLEX Metro and RecoverPoint CLR/CRR - remote site is array-based splitter .... 111 60 Shared VPLEX splitter ...... 112 61 Shared RecoverPoint RPA cluster ...... 112 62 Replication with VPLEX Local and ...... 113 63 Replication with VPLEX Metro and CLARiiON...... 114 64 Support for Site Recovery Manager...... 115 65 VS1 single-engine cluster...... 118 66 VS1 dual-engine cluster ...... 119 67 VS1 quad-engine cluster...... 120 68 VS1 engine ...... 121 69 IP addresses in cluster-1 ...... 123 70 IP addresses in cluster-2 (VPLEX Metro or Geo) ...... 124 71 Ethernet cabling - VS1 quad-engine cluster...... 126 72 Serial cabling - VS1 quad-engine cluster...... 127 73 Serial cabling - VS1 quad-engine cluster...... 128 74 AC power cabling - VS1 quad-engine cluster ...... 129 75 Ethernet cabling - VS1 dual-engine cluster...... 130 76 Serial cabling - VS1 dual-engine cluster...... 131 77 Fibre Channel cabling- VS1 dual-engine cluster...... 132 78 AC power cabling - VS1 dual-engine cluster...... 133 79 Ethernet cabling - VS1 single-engine cluster ...... 134 80 Serial cabling - VS1 single-engine cluster ...... 134 81 Fibre Channel cabling - VS1 single-engine cluster ...... 134 82 AC power cabling - VS1 single-engine cluster ...... 135 83 Fibre Channel WAN COM connections - VS1...... 135 84 IP WAN COM connections - VS1...... 136

6 EMC VPLEX Product Guide Tables

Title Page

1 VPLEX features and benefits ...... 24 2 Hardware components...... 33 3 GeoSynchrony AccessAnywhere features...... 50 4 Web-based provisioning methods...... 55 5 Types of data mobility operations ...... 93 6 How VPLEX Metro HA recovers from failure...... 103

EMC VPLEX Product Guide 7 8 EMC VPLEX Product Guide Preface

As part of an effort to improve and enhance the performance and capabilities of its product line, EMC® from time to time releases revisions of its hardware and software. Therefore, some functions described in this document may not be supported by all revisions of the software or hardware currently in use. Your product release notes provide the most up-to-date information on product features. If a product does not function properly or does not function as described in this document, please contact your EMC representative.

About this guide This document provides a high level description of the VPLEX™ product and GeoSynchrony™ 5.1 features.

Audience This document is part of the VPLEX system documentation set and introduces the VPLEX product and its features. This document provides information for customers and prospective customers to understand VPLEX and how it supports their data storage strategies.

Related Related documents (available on EMC Support Online) include: documentation ◆ EMC VPLEX Release Notes for GeoSynchrony Releases 5.2 ◆ EMC VPLEX Product Guide ◆ EMC VPLEX Site Preparation Guide ◆ EMC VPLEX Hardware Installation Guide ◆ EMC VPLEX Configuration Worksheet ◆ EMC VPLEX Configuration Guide ◆ EMC VPLEX Security Configuration Guide ◆ EMC VPLEX CLI Guide ◆ EMC VPLEX Administration Guide ◆ VPLEX Management Console Help ◆ EMC VPLEX Element Manager API Guide ◆ EMC VPLEX Open-Source Licenses ◆ EMC Regulatory Statement for EMC VPLEX ◆ Procedures provided through the Generator ◆ EMC Host Connectivity Guides

EMC VPLEX Product Guide 9 Preface

Conventions used in EMC uses the following conventions for special notices. this document Note: A note presents information that is important, but not hazard-related.

A caution contains information essential to avoid data loss or damage to the system or equipment.

IMPORTANT An important notice contains information essential to operation of the software.

Typographical conventions EMC uses the following type style conventions in this document: Normal Used in running (nonprocedural) text for: • Names of interface elements (such as names of windows, dialog boxes, buttons, fields, and menus) • Names of resources, attributes, pools, Boolean expressions, buttons, DQL statements, keywords, clauses, environment variables, functions, utilities • URLs, pathnames, filenames, directory names, computer names, filenames, links, groups, service keys, file systems, notifications Bold Used in running (nonprocedural) text for: • Names of commands, daemons, options, programs, processes, services, applications, utilities, kernels, notifications, system call, man pages Used in procedures for: • Names of interface elements (such as names of windows, dialog boxes, buttons, fields, and menus) • What user specifically selects, clicks, presses, or types Italic Used in all text (including procedures) for: • Full titles of publications referenced in text • Emphasis (for example a new term) • Variables Courier Used for: • System output, such as an error message or script • URLs, complete paths, filenames, prompts, and syntax when shown outside of running text Courier bold Used for: • Specific user input (such as commands) Courier italic Used in procedures for: • Variables on command line • User input variables [] Square brackets enclose optional values | Vertical bar indicates alternate selections - the bar means “or” {} Braces indicate content that you must specify (that is, x or y or z) ... Ellipses indicate nonessential information omitted from the example

Where to get help EMC support and product information can be obtained as follows. Product information — For documentation, release notes, software updates, or for information about EMC products, licensing, and service, go to the EMC Support website (registration required) at:

10 EMC VPLEX Product Guide Preface

http://support.EMC.com

Technical support — For technical support, go to the EMC Support site. To open a service request, you must have a valid support agreement. Please contact your EMC sales representative for details about obtaining a valid support agreement or to answer any questions about your account.

Your comments Your suggestions will help us continue to improve the accuracy, organization, and overall quality of the user publications. Please send your opinion of this document to: [email protected]

EMC VPLEX Product Guide 11 Preface

12 EMC VPLEX Product Guide 1 Introducing VPLEX

This chapter introduces the EMC VPLEX product family. Topics include: ◆ VPLEX overview ...... 14 ◆ VPLEX product family ...... 16 ◆ Mobility...... 19 ◆ Availability ...... 21 ◆ Collaboration ...... 22 ◆ VPLEX Witness...... 25 ◆ Non-disruptive upgrade ...... 27 ◆ New features in this release...... 28

Introducing VPLEX 13 Introducing VPLEX

VPLEX overview EMC VPLEX federates data located on heterogeneous storage arrays to create dynamic, distributed, highly available data centers. Use VPLEX to: ◆ Move data nondisruptively between EMC and non-EMC storage arrays without any downtime for the host. VPLEX moves data transparently and the virtual volumes retain the same identities and the same access points to the host. The host does not need to be reconfigured. ◆ Collaborate over distance. AccessAnywhere provides cache-consistent active-active access to data across VPLEX clusters. Multiple users at different sites can work on the same data while maintaining consistency of the dataset. ◆ Protect data in the event of disasters or failure of components in your data centers. With VPLEX, you can withstand failures of storage arrays, cluster components, an entire site failure, or loss of communication between sites (when two clusters are deployed) and still keep applications and data online and available. With VPLEX, you can transform the delivery of IT to a flexible, efficient, reliable, and resilient service.

Figure 1 VPLEX active-active

VPLEX addresses three primary IT needs: ◆ Mobility: VPLEX moves applications and data between different storage installations: • Within the same data center or across a campus (VPLEX Local), • Within a geographical region (VPLEX Metro), • Across even greater distances (VPLEX Geo). ◆ Availability: VPLEX creates high-availability storage infrastructure across these same varied geographies with unmatched resiliency.

14 EMC VPLEX Product Guide Introducing VPLEX

◆ Collaboration: VPLEX provides efficient real-time data collaboration over distance for Big Data applications. VPLEX offers the following unique innovations and advantages: ◆ VPLEX’s distributed/federated virtual storage enables new models of application and Data Mobility. VPLEX is optimized for virtual server platforms (VMware ESX, Hyper-V, Oracle Virtual Machine, AIX VIOS). VPLEX can streamline or accelerate transparent workload relocation over distances, including moving virtual machines. ◆ Size VPLEX to meet your current needs. Grow VPLEX as your needs grow. A VPLEX cluster includes one, two, or four engines. Add an engine to an operating VPLEX cluster without interrupting service. Add a second cluster to an operating VPLEX cluster without interrupting service. VPLEX’s scalable architecture ensures maximum availability, fault tolerance, and performance. ◆ All virtual volumes presented by VPLEX are always accessible from every engine in a VPLEX cluster. All physical storage connected to VPLEX is accessible from every engine in the VPLEX cluster. ◆ In Metro and Geo configurations, VPLEX AccessAnywhere provides cache-consistent active-active access to data across two VPLEX clusters. VPLEX pools the storage resources in multiple data centers so that the data can be accessed anywhere. With VPLEX, you can: ◆ Provide continuous availability and workload mobility. ◆ Replace your tedious data movement and technology refresh processes with VPLEX’s patented simple, frictionless two-way data exchange between locations. ◆ Create an active-active configuration for the active use of resources at both sites. ◆ Provide instant access to data between data centers. VPLEX allows simple, frictionless two-way data exchange between locations. ◆ Combine VPLEX with virtual servers to enable private and hybrid cloud computing.

VPLEX overview 15 Introducing VPLEX

VPLEX product family The VPLEX product family includes: ◆ VPLEX Local ◆ VPLEX Metro ◆ VPLEX Geo.

EMC VPLEX Local EMC VPLEX Metro EMC VPLEX Geo

Within a data center AccessAnywhere at AccessAnywhere at synchronous asynchronous distances distances

VPLX-000389

Figure 2 VPLEX family: Local, Metro, and Geo

VPLEX Local VPLEX Local consists of a single cluster. VPLEX Local: ◆ Federates EMC and non-EMC storage arrays. Federation allows transparent data mobility between arrays for simple, fast data movement and technology refreshes. ◆ Standardizes LUN presentation and management using simple tools to provision and allocate virtualized storage devices. ◆ Improves storage utilization using pooling and capacity aggregation across multiple arrays. ◆ Increases protection and high availability for critical applications. Mirror storage across mixed platforms without host resources. Leverage your existing storage resources to deliver increased protection and availability for critical applications. Deploy VPLEX Local within a single data center.

VPLEX Metro VPLEX Metro consists of two VPLEX clusters connected by inter-cluster links with not more than 5ms1 round trip delay (RTT). VPLEX Metro: ◆ Transparently relocates data and applications over distance, protects your data center against disaster, and enables efficient collaboration between sites. Manage all of your storage in both data centers from one management interface.

1. Refer to VPLEX and vendor-specific White Papers for confirmation of latency limitations.

16 EMC VPLEX Product Guide Introducing VPLEX

◆ Mirrors your data to a second site, with full access at near local speeds. Deploy VPLEX Metro within a data center for: ◆ Additional virtual storage capabilities beyond that of a VPLEX Local. ◆ Higher availability. Metro clusters can be placed up to 100 km apart, allowing them to be located at opposite ends of an equipment room, on different floors, or in different fire suppression zones; all of which might be the difference between riding through a local fault or fire without an outage. Deploy VPLEX Metro between data centers for: ◆ Mobility: redistribute application workloads between the two data centers. ◆ Availability: applications must keep running in the presence of data center failures. ◆ Collaboration: applications in one data center need to access data in the other data center. ◆ Distribution: one data center lacks space, power, or cooling. Combine VPLEX Metro virtual storage and virtual servers to: ◆ Transparently move virtual machines and storage across synchronous distances. ◆ Improve utilization and availability across heterogeneous arrays and multiple sites. Distance between clusters is limited by physical distance and by host and application requirements.

VPLEX Geo VPLEX Geo consists of two VPLEX clusters connected by inter-cluster links with not more than 50ms RTT. VPLEX Geo: VPLEX Geo provides the same benefits as VPLEX Metro to data centers at asynchronous distances: ◆ Transparent application relocation, ◆ Increased resiliency, ◆ Efficient collaboration, and ◆ Simplified management. As with any asynchronous transport media, bandwidth is critical to ensure optimal performance.

Note: VPLEX Geo availability and performance are different than those of VPLEX Metro. See “Data caching” on page 59 and “Asynchronous consistency groups” on page 63.

Grow your VPLEX Deploy VPLEX to meet your current high-availability and data mobility without disruption requirements. Add engines or a second cluster to scale VPLEX as your requirements increase. You can do all the following tasks without disrupting service: ◆ Add engines to a VPLEX cluster. ◆ Upgrade engine hardware from VS1 to VS2. ◆ Convert a VPLEX Local to a VPLEX Metro or VPLEX Geo.

VPLEX product family 17 Introducing VPLEX

◆ Upgrade GeoSynchrony. ◆ Add or remove integration with RecoverPoint.

18 EMC VPLEX Product Guide Introducing VPLEX

Mobility VPLEX mobility allows you to move data located on either EMC or non-EMC storage arrays simply and without disruption. Use VPLEX to simplify the management of your data center and eliminate outages to migrate data or refresh technology. Combine VPLEX with server virtualization to transparently move and relocate virtual machines and their corresponding applications and data without downtime.

Figure 3 Move data without disrupting service

Relocate, share, and balance resources between sites, within a campus or between data centers. ◆ In VPLEX Metro configurations, operations are synchronous and clusters can be separated by up to 5 ms round trip time (RTT) latency. ◆ In VPLEX Geo configurations, operations are asynchronous and clusters can be separated by up to 50 ms RTT latency.

VPLEX Local VPLEX Metro and Geo

Cluster A Cluster B

MOBILITY ACCESS ANYWHERE

Move data without impacting users Move data, VMs, and applications over distance Heterogeneous nondisruptive Active-Active data centers avoid disasters, technology refreshes simplify migrations and workload balancing

Figure 4 Data mobility with VPLEX Local, Metro, and Geo

Use the storage and compute resources at either VPLEX cluster location to automatically balance loads.

Mobility 19 Introducing VPLEX

Move data between sites, over distance, while the data remains online and available during the move. No outage or downtime is required. VPLEX federates both EMC and non-EMC arrays, so even if you have a mixed storage environment, VPLEX provides an easy solution. Extent migrations move data between extents in the same cluster. Use extent migrations to: ◆ Move extents from a “hot” storage volume shared by other busy extents. ◆ Defragment a storage volume to create more contiguous free space. ◆ Migrate data between dissimilar arrays. Device migrations move data between devices (RAID 0, RAID 1, or RAID C devices built on extents or on other devices) on the same cluster or between devices on different clusters. Use device migrations to: ◆ Migrate data between dissimilar arrays. ◆ Relocate a hot volume to a faster array. ◆ Relocate devices to new arrays in a different cluster. Use VPLEX to do an immediate migration between extents or devices, or create re-usable migration plan files to automate routine tasks. Up to 25 local and 25 distributed migrations can be in progress at the same time. Any migrations beyond those limits are queued until an ongoing migration completes.

20 EMC VPLEX Product Guide Introducing VPLEX

Availability VPLEX redundancy provides reduced recovery time objective (RTO) and recovery point objective (RPO). VPLEX features allow the highest possible resiliency in the event of an outage. Figure 5 shows a VPLEX Metro configuration where storage has become unavailable at one of the cluster sites. Cluster A Cluster B

ACCESS ANYWHERE X

Maintain availability and non- stop access by mirroring across locations Eliminate storage operations from failover

Figure 5 High availability infrastructure example

Because VPLEX’s GeoSynchrony AccessAnywhere mirrors all data, applications continue without disruption using the back-end storage at the unaffected site. Chapter 2, “VS2 Hardware” describes VPLEX’s hardware redundancy. Chapter 4, “Integrity and Resiliency” describes how VPLEX continues uninterrupted service during failures such as: ◆ Unplanned and planned storage outages ◆ SAN outages ◆ VPLEX component failures ◆ VPLEX cluster failures ◆ Data center outages

Availability 21 Introducing VPLEX

Collaboration Collaboration increases utilization of passive data recovery assets and provides simultaneous access to data.

ACCESS ANYWHERE

Enable concurrent read/write access to data across locations

Figure 6 Distributed data collaboration

VPLEX AccessAnywhere enables multiple users at different sites to work on the same data while maintaining consistency of the dataset. Traditional solutions support collaboration across distance by shuttling entire files between locations using FTP. This is slow, contributes to network congestion for large files (or even small files that move regularly). One site may sit idle waiting to receive the latest data. Independent work results in inconsistent data that must be synchronized, a task that becomes more difficult and time-consuming as your data-sets increase in size. AccessAnywhere supports co-development that requires collaborative workflows such as engineering, graphic arts, video, educational programs, design, and research. VPLEX provides a scalable solution for collaboration.

22 EMC VPLEX Product Guide Introducing VPLEX

Architecture highlights A VPLEX cluster consists of: ◆ 1, 2, or 4 VPLEX engines. Each engine contains two directors. Dual-engine or quad-engine clusters contain: • 1 pair of Fibre Channel switches for communication between directors. • 2 UPS (Uninterruptible Power Sources) for battery power backup of the Fibre Channel switches and the management server. ◆ A management server. The management server has a public Ethernet port, which provides cluster management services when connected to the customer network.

HP, Oracle (Sun), Microsoft, Linux, IBM Oracle, VMware, Microsoft

Brocade, Cisco

VPLEX

Brocade, Cisco

HP, Oracle (Sun), Hitachi, HP (3PAR), IBM, EMC

Figure 7 Architecture highlights

VPLEX conforms to established world wide naming (WWN) guidelines that can be used for zoning. VPLEX supports EMC storage and arrays from other storage vendors, such as HDS, HP, and IBM. VPLEX provides storage federation for operating systems and applications that support clustered file systems, including both physical and virtual server environments with VMware ESX and Microsoft Hyper-V. VPLEX supports network fabrics from Brocade and Cisco. Refer to the EMC Simple Support Matrix, EMC VPLEX and GeoSynchrony, available at http://elabnavigator.EMC.com under the Simple Support Matrix tab.

Architecture highlights 23 Introducing VPLEX

Features and benefits Table 1 summarizes VPLEX features and benefits.

Table 1 VPLEX features and benefits

Features Benefits

Mobility Migration: Move data and applications without impact on users. Virtual Storage federation: Achieve transparent mobility and access in a data center and between data centers. Scale-out cluster architecture: Start small and grow larger with predictable service levels.

Availability Resiliency: Mirror across arrays within a single data center or between data centers without host impact. This increases availability for critical applications. Distributed Cache Coherency: Automate sharing, balancing, and failover of I/O across the cluster and between clusters whenever possible. Advanced data caching: Improve I/O performance and reduce storage array contention.

Collaboration Distributed Cache Coherency: Automate sharing, balancing, and failover of I/O across the cluster and between clusters whenever possible.

For all VPLEX products, GeoSynchrony: ◆ Presents storage volumes from back-end arrays to VPLEX engines. ◆ Federates the storage volumes into hierarchies of VPLEX virtual volumes with user-defined configuration and protection levels. ◆ Presents virtual volumes to production hosts in the SAN via the VPLEX front-end. ◆ For VPLEX Metro and VPLEX Geo products, presents a global, block-level directory for distributed cache and I/O between VPLEX clusters.

24 EMC VPLEX Product Guide Introducing VPLEX

VPLEX Witness Starting in GeoSynchrony 5.0, VPLEX Witness helps multi-cluster VPLEX configurations automate the response to cluster failures and inter-cluster link outages. VPLEX Witness is an optional component installed as a virtual machine on a customer host. The customer host must be deployed in a separate failure domain from either VPLEX cluster to eliminate the possibility of a single fault affecting both a cluster and VPLEX Witness. VPLEX Witness connects to both VPLEX clusters over the management IP network as illustrated in Figure 8:

Figure 8 How VPLEX Witness is deployed

VPLEX Witness observes the state of the clusters, and thus can distinguish between an outage of the inter-cluster link and a cluster failure. VPLEX Witness uses this information to guide the clusters to either resume or suspend I/O. ◆ In VPLEX Metro configurations, VPLEX Witness provides seamless zero RTO fail-over for synchronous consistency groups. ◆ In VPLEX Geo configurations, VPLEX Witness can be useful for diagnostic purposes.

Note: VPLEX Witness works in conjunction with consistency groups. VPLEX Witness guidance does not apply to local volumes and distributed volumes that are not members of a consistency group. VPLEX Witness does not automate any fail-over decisions for asynchronous consistency groups (VPLEX Geo configurations).

HA vs. DR Highly available (HA) designs are typically deployed within a data center and disaster recovery (DR) functionality is typically deployed between data centers. Traditionally: ◆ Data center components operate in active/active mode (or active/passive with automatic failover).

VPLEX Witness 25 Introducing VPLEX

◆ Legacy replication technologies use active/passive techniques and require manual failover to use the passive component. When VPLEX Metro active/active replication technology is used in conjunction with VPLEX Witness, the line between local high availability and long-distance disaster recovery is somewhat blurred. With VPLEX Metro and VPLEX Witness, high availability is stretched beyond the data center walls.

Note: VPLEX Witness has no effect on failure handling for distributed volumes outside of consistency groups or volumes in asynchronous consistency groups. Witness also has no effect on distributed volumes in synchronous consistency groups when the preference rule is set to no-automatic-winner.

See “High Availability with VPLEX Witness” on page 74 for more information on VPLEX Witness including the differences in how VPLEX Witness handles failures and recovery.

26 EMC VPLEX Product Guide Introducing VPLEX

Non-disruptive upgrade VPLEX management server software and GeoSynchrony can be upgraded without disruption. VPLEX hardware can be replaced, the engine count in a cluster increased, and a VPLEX Local can be expanded to a VPLEX Metro or Geo without disruption. VPLEX never has to be completely shut down.

Storage, VPLEX enables the easy addition or removal of storage, applications, and hosts. application, and host upgrades When VPLEX encapsulates back-end storage, the block-level nature of the coherent cache allows the upgrade of storage, applications, and hosts. You can configure VPLEX so that all devices within VPLEX have uniform access to all storage blocks.

Increase engine When capacity demands increase, VPLEX supports hardware upgrades for count single-engine VPLEX systems to dual-engine and dual-engine to quad-engine systems. These upgrades also increase the availability of front-end and back-end ports in the data center.

Software upgrades VPLEX is fully redundant for: ◆ Ports ◆ Paths ◆ Directors ◆ Engines This redundancy allows GeoSynchrony on VPLEX Local and Metro to be upgraded without interrupting host access to storage. No service window or application disruption is required. On VPLEX Geo configurations, the upgrade script ensures that the application is active/passive before allowing the upgrade.

Note: You must upgrade the VPLEX management server software before upgrading GeoSynchrony. Management server upgrades are non-disruptive.

Simple support EMC publishes storage array interoperability information in a Simple Support Matrix matrix available on EMC Online Support. This information details tested, compatible combinations of storage hardware and applications that VPLEX supports. The Simple Support Matrix can be located at: https://support.emc.com/search

Non-disruptive upgrade 27 Introducing VPLEX

New features in this release Release 5.2 includes the following new features: ◆ New performance dashboard and CLI-based performance capabilities A new customizable performance monitoring dashboard provides a view into the performance of your VPLEX system. You decide which aspects of the system's performance to view and compare. Alternatively, you can use the CLI to create a toolbox of custom monitors to operate under varying conditions including debugging, capacity planning, and workload characterization. The following new dashboards are provided by default: • System Resources • End To End Dashboard • Front End Dashboard • Back End Dashboard • Rebuild Dashboard • WAN Dashboard A number of new charts are also available in the GUI. ◆ Improved diagnostics Enhancements include the following: • Collect diagnostics improvements – Prevent more than one user from running Log Collection at any one time, thus optimizing resources and maintaining the validity of the collected logs. – Accelerate performance by combining multiple scripts to get tower debug data into a single script. These improvements decrease log collection time and log size by not collecting redundant information. • Health check improvements – Include Consistency group information into overall health check. – Include WAN link information into overall health check. ◆ Storage array based volume expansion Storage array based volume expansion enables storage administrators to expand the size of any virtual volume by expanding the underlying storage-volume. The supported device geometries include virtual volumes mapped 1:1 to storage volumes, virtual volumes on multi-legged RAID-1, and distributed RAID-1, RAID-0, and RAID-C devices under certain conditions. The expansion operation is supported through expanding the corresponding Logical Unit Numbers (LUNs) on the back end (BE) array.

Note: Virtual volume expansion is not supported on RecoverPoint enabled volumes.

◆ VAAI VMware API for Array Integration (VAAI) now supports WriteSame(16) calls. The WriteSame (16) SCSI command provides a mechanism to offload initializing virtual disks to VPLEX. WriteSame (16) requests the server to write blocks of data transferred by the application client multiple times to consecutive logical blocks. ◆ Cluster repair and recover

28 EMC VPLEX Product Guide Introducing VPLEX

In the event of a disaster that destroys the VPLEX cluster, but leaves the storage (including meta data) and the rest of the infrastructure intact, the cluster recover procedure restores the full configuration after replacing the VPLEX cluster hardware. ◆ FRU procedure A field replaceable unit (FRU) procedure is implemented that automates engine chassis replacement in VS2 configurations. ◆ Emerson 350VA UPS support Either APC or Emerson uninterruptible power supplies (UPS) can be used in a VPLEX cluster. The CLI is updated to display which UPS is installed and SYR includes the same data for Emerson units as it currently does for APC units. ◆ SYR Reporting SYR Reporting is enhanced to collect Local COM switch information. ◆ Element Manager API VPLEX Element Manager API has been enhanced to support additional external management interfaces. Supported interfaces include: • ProSphere for discovery and capacity reporting/chargeback • UIM for provisioning and reporting on VPLEX in a Vblock • Foundation Management for discovery of VPLEX in a Vblock • Archway for application consistent PiT copies with RecoverPoint Splitter ◆ Event message severities All VPLEX events with a severity of “ERROR” in previous releases of GeoSynchrony have been re-evaluated to ensure the accuracy of their severity with respect to Service Level Agreement requirements. ◆ Back end (BE) Logical Unit Number (LUN) swapping The system detects and corrects BE LUN swap automatically. • Once LUN Remap is detected, VPLEX corrects its mapping to prevent data corruption. • On detection of LUN remap, a call-home event will be sent. ◆ Invalidate cache procedure This is a procedure to invalidate the cache associated with a virtual-volume or a set of virtual-volumes within a consistency-group that has experienced a data corruption and needs data to be restored from backup. ◆ Customer settable password policy You can set the password policy for all VPLEX administrators, for example, specifying the minimum password length and the password expiration date. ◆ VPLEX presentation of fractured state for DR1 replica volumes DR1 RecoverPoint-Replica volumes will not be DR1s while in use and their status will reflect this in the CLI and GUI as disconnected. ◆ Performance improvement for larger IO block sizes System performance of write operations is improved for block sizes greater than 128KB. ◆ VPLEX presentation of Fake Size

New features in this release 29 Introducing VPLEX

Fake Size is the ability to use Replica volumes larger than Production. The limitation of creating RecoverPoint replicas where source LUN and target LUN must be of identical volume has been removed. Using the VPLEX RecoverPoint Splitter, you can now replicate to a target LUN that is larger than the source LUN. To use the Fake Size feature, you must be running RecoverPoint 4.0 or higher.

Note: If a RecoverPoint fail-over operation is used, which swaps the production/replica roles, it is possible for the new production volume to have a fake size instead.

◆ RecoverPoint Splitter support for 8k LUNs For 8K Volumes to be put in use, it would need RP 4.0 or higher builds configured.

30 EMC VPLEX Product Guide 2

VS2 Hardware

This chapter provides a high-level overview of the major hardware components in a VS2 VPLEX and how hardware failures are managed to support uninterrupted service. Topics include: ◆ The VPLEX cluster ...... 32 ◆ The VPLEX engine and directors...... 37 ◆ VPLEX power supplies...... 39 ◆ Power and environmental monitoring...... 40 ◆ Hardware failure management and best practices...... 41 ◆ Component IP addresses...... 47

Note: See Appendix A, “VS1 Hardware.” for information about VS1 hardware.

VS2 Hardware 31 VS2 Hardware

The VPLEX cluster There are two generations of VS hardware: VS1 and VS2. A VPLEX cluster (either VS1 or VS2) consists of: ◆ 1, 2, or 4 VPLEX engines Each engine contains two directors. Dual-engine or quad-engine clusters also contain: • 1 pair of Fibre Channel switches for communication between directors. • 2 uninterruptible power supplies (UPS) for battery power backup of the Fibre Channel switches and the management server. ◆ A management server. ◆ Ethernet or Fibre Channel cabling and respective switching hardware connect the distributed VPLEX hardware components. ◆ I/O modules provide front-end and back-end connectivity between SANs and to remote VPLEX clusters in VPLEX Metro or VPLEX Geo configurations.

Note: In the current release of GeoSynchrony, VS1 and VS2 hardware cannot co-exist in a cluster, except in a VPLEX Local cluster during a non disruptive hardware upgrade from VS1 to VS2.

Management server Each VPLEX cluster has one management server.

Figure 9 Management server - rear view

You can manage both clusters in VPLEX Metro and VPLEX Geo configurations from a single management server. The management server: ◆ Coordinates data collection, VPLEX software upgrades, configuration interfaces, diagnostics, event notifications, and some director-to-director communication. ◆ Forwards VPLEX Witness traffic between directors in the local cluster and the remote VPLEX Witness server. Redundant internal network IP interfaces connect the management server to the public network. Internally, the management server is on a dedicated management IP network that provides accessibility for all major components in the cluster.

32 EMC VPLEX Product Guide VS2 Hardware

Fibre Channel switches Fibre Channel switches provide high availability and redundant connectivity between directors and engines in a dual-engine or quad-engine cluster.

Figure 10 Fibre Channel switch - rear view

Each Fibre Channel switch is powered by a UPS, and has redundant I/O ports for intra-cluster communication. The Fibre Channel switches do not connect to the front-end hosts or back-end storage.

1, 2, or 4 VPLEX engines A VPLEX cluster can have 1 (single), 2 (dual), or 4 (quad) engines. ◆ Figure 11 on page 34 shows a single-engine cluster. ◆ Figure 12 on page 35 shows a dual-engine cluster. ◆ Figure 13 on page 36 shows a quad-engine cluster.

Note: The placement of components shown for single-engine and dual-engine clusters allows for non disruptive addition of engines to scale the cluster to a larger configuration.

Table 2 describes the major components of a VPLEX cluster and their functions. Table 2 Hardware components

Component Description

Engine Contains two directors, with each providing front-end and back-end I/O connections.

Director Contains: • Five I/O modules (IOMs), as identified in Figure 14 on page 37 • Management module, for intra-cluster communication • Two redundant 400 W power supplies with built-in fans • CPU • Solid-state disk (SSD) that contains the GeoSynchrony operating environment • RAM

Management server Provides: • Management interface to a public IP network • Management interfaces to other VPLEX components in the cluster • Event logging service

Fibre Channel COM Provides intra-cluster communication support among the directors. This is switches (Dual-engine or separate from the storage I/O. quad-engine cluster only)

The VPLEX cluster 33 VS2 Hardware

Component Description

Power subsystem Power distribution panels (PDPs) connect to the site’s AC power source, and transfer power to the VPLEX components through power distribution units (PDUs). This provides a centralized power interface and distribution control for the power input lines. The PDPs contain manual on/off power switches for their power receptacles.

Standby Power Supply One SPS assembly (two SPS modules) provides backup power to each engine in (SPS) the event of an AC power interruption. Each SPS module maintains power for two five-minute periods of AC loss while the engine shuts down.

Uninterruptible Power Supply One UPS provides battery backup for Fibre Channel switch A and the (UPS) management server, and a second UPS provides battery backup for Fibre (Dual-engine or quad-engine Channel switch B. Each UPS module maintains power for two five-minute periods cluster only) of AC loss while the engine shuts down.

VS2 single-engine cluster

ON ON I I O O OFF OFF

ON ON I I O O OFF OFF

ON ON I I O O OFF OFF

Laptop tray

OFF OFF

O O

I I ON

ON Management server

OFF OFF

O O

I I

ON ON

OFF OFF

O O

I I

ON ON Engine 1, Director B Engine 1, Director A

SPS 1B SPS 1A

VPLX-000226

Figure 11 VS2: Single-engine cluster

34 EMC VPLEX Product Guide VS2 Hardware

VS2 dual-engine cluster

ON ON I I O O OFF OFF

ON ON I I O O OFF OFF

ON ON I I O O OFF OFF

Laptop tray Fibre Channel switch B UPS B Fibre Channel switch A

UPS A

OFF OFF

O O

I I ON

ON Management server

OFF OFF

O Engine 2, Director B O Engine 2, Director A

I I

ON ON

SPS 2B SPS 2A

OFF OFF

O O

I I

ON ON Engine 1, Director B Engine 1, Director A

SPS 1B SPS 1A

VPLX-000227

Figure 12 VS2: Dual-engine cluster

The VPLEX cluster 35 VS2 Hardware

VS2 quad-engine cluster

ON ON I I O O Engine 4, Director B OFF OFF Engine 4, Director A

SPS 4B SPS 4A

ON ON I I O O OFF OFF

Engine 3, Director B Engine 3, Director A

ON ON I I O O OFF OFF SPS 3B SPS 3A Laptop tray Fibre Channel switch B UPS B Fibre Channel switch A

UPS A

OFF OFF

O O

I I ON

ON Management server

OFF OFF

O Engine 2, Director B O Engine 2, Director A

I I

ON ON

SPS 2B SPS 2A

OFF OFF

O O

I I

ON ON Engine 1, Director B Engine 1, Director A

SPS 1B SPS 1A

VPLX-000228

Figure 13 VS2: Quad-engine cluster

36 EMC VPLEX Product Guide VS2 Hardware

The VPLEX engine and directors A VPLEX cluster can include 1, 2, or 4 engines. Each VPLEX engine includes two directors. Each director provides front-end and back-end I/O connections. Figure 14 shows a VPLEX engine and its two directors: Director A and Director B. Management module B end IOM B0 - Front end IOM B1 - Back COM WAN IOM B2 - IOM B3 - Local COM IOM B4 - reserved Management module A end IOM A0 - Front end IOM A1 - Back COM WAN IOM A2 - IOM A3 - Local COM IOM A4 - reserved

Director B Director A

Depending on the cluster topology, slots A2 and B2 contain one of the following I/O modules (IOMs) (both IOMs must be the same type): 8 Gb/s 10 Gb/s Filler module Fibre Channel Ethernet (VPLEX Local only)

VPLX-000229

Figure 14 Engine, rear view

The GeoSynchrony operating system runs on the VPLEX directors, and supports: ◆ I/O request processing ◆ Distributed cache management ◆ Virtual-to-physical translations ◆ Interaction with storage arrays

Front-end and back-end connectivity Redundant I/O modules provide connectivity to the front -end and back-end ports: ◆ Four 8 Gb/s Fibre Channel I/O modules provide front-end connectivity. ◆ Four 8 Gb/s ports provide back-end connectivity. ◆ Industry-standard Fibre Channel ports connect to host initiators and storage devices.

The VPLEX engine and directors 37 VS2 Hardware

WAN connectivity For VPLEX Metro and Geo configurations, dual inter-cluster WAN links connect the two clusters. ◆ Clusters in a VPLEX Metro can be connected by either Fibre Channel (8 Gbps) or Gigabit Ethernet (10 GbE). ◆ Clusters in a VPLEX Geo are connected by Gigabit Ethernet. For Fibre Channel connections, IOMs A2 and B2 contain four Fibre Channel ports. For IP connections, these IOMs contain two Ethernet ports.

CAUTION The inter cluster link carries unencrypted user data. To protect the security of the data, secure connections are required between clusters.

Redundant I/O paths Within an engine, a director is connected to its peer director over an internal communication channel. Between engines, directors are connected over dual Fibre Channel switches. All VPLEX directors participate in intra-cluster communications. When properly zoned and configured, the front-end and back-end connections provide redundant I/O paths that can be serviced by any director.

38 EMC VPLEX Product Guide VS2 Hardware

VPLEX power supplies The VPLEX cluster is connected to your AC power source. Each cluster is equipped with standby and uninterruptible power supplies that enable the cluster to ride out power disruptions.

AC power connection Connect your VPLEX cluster to two independent power zones to assure a highly available power distribution configuration. Figure 15 shows AC power supplied from independent power distribution units (PDUs): Circuit Customer breaker PDU 1 Circuit Customer off (0) breaker PDU 2 Circuit breakers - Numbers off (0) 27 Circuit breakers - Numbers 28 8 29 9

30 10 ... PDU 1 11 CB 28 PDU 2 ... EMC CB 9 cabinet, rear Labels on customer power lines PDPs

Power Zone Lower B Lower A Power PDU# 1 2 PDU# Power zone B zone A Panel# Panel# (black) (gray)

CB#(s) 28 9 CB#(s)

Cabinet serial number

Figure 15 VPLEX cluster independent power zones

This is a required configuration to assure high availability. The power supply module is field-replaceable. When power supplies are replaced one at a time, no disruption of service occurs. VPLEX power distribution panels (PDPs) contain manual on/off power switches for their power receptacles. For additional information on power requirements and practices, see the EMC Best Practices Guide for AC Power Connections in Two-PDP Bays.

Standby power Each engine is connected to two standby power supplies (SPS) that provide battery supplies backup to each director. SPSs have sufficient capacity to ride through transient site power failures or to vault their cache in the event power is not restored within 30 seconds. A single standby power supply provides enough power for the attached engine to ride through two back-to-back 5-minute losses of power. Refer to “Protection from power failure” on page 83”.

VPLEX power supplies 39 VS2 Hardware

Uninterruptible Dual and quad engine clusters include two uninterruptible power supplies; UPS-A power supplies and UPS-B. In the event of a power failure: ◆ UPS-A provides power to the management server and Fibre Channel switch A. ◆ UPS-B provides power for Fibre Channel switch B. The UPSs provide sufficient power to support the Fibre Channel switches and management server for two back-to-back 5-minute power outages.

Power and GeoSynchrony monitors the overall health of the VPLEX cluster and the environment environmental for the VPLEX cluster hardware. monitoring Power and environmental conditions are monitored at regular intervals. Any changes to the VPLEX power or hardware health are logged. Conditions that indicate a hardware or power fault generate a call-home event.

40 EMC VPLEX Product Guide VS2 Hardware

Hardware failure management and best practices This section describes how VPLEX component failures are handled, and the best practices that allow applications to tolerate these failures. All critical processing components of a VPLEX system use at a minimum pair-wise redundancy to maximize data availability.

Note: All VPLEX hardware component failures are reported to the EMC Service Center to ensure timely response and repair of these fault conditions.

Storage array VPLEX makes it easy to mirror the data of a virtual volume between two or more failures storage volumes using a RAID 1 device. When a mirror is configured, a failed array or planned outage does not interrupt service. I/O continues to the healthy leg of the device. When the failed/removed array is restored, the VPLEX system uses the information in logging volumes to synchronize the mirrors. Only the changed blocks are synchronized, minimizing inter-cluster traffic for this task. Figure 16 shows a virtual volume mirrored between two arrays.

Figure 16 Local mirrored volumes

Best practices: local ◆ For critical data, mirror data on two or more storage volumes that are located on mirrored volumes separate arrays. ◆ For the best performance, storage volumes at each leg of the distributed device should be the same size and hosted on arrays with the same performance characteristics.

Fibre Channel port VPLEX communications use redundant paths that allow communication to continue failures during port failures. This redundancy allows multipathing software to redirect I/O around path failures that occur as part of port failures.

Hardware failure management and best practices 41 VS2 Hardware

VPLEX has its own multipathing logic that maintains redundant paths to back-end storage from each director. This allows VPLEX to continue uninterrupted during failures of the back-end ports, back-end fabric, and the array ports that connect the physical storage to VPLEX. The small form-factor pluggable (SFP) transceivers used for connectivity to VPLEX are field replaceable units (FRUs).

Best practices: Fibre To ensure the highest reliability of your configuration: Channel ports Front end: ◆ Ensure there is a path from each host to at least one front-end port on director A and at least one front-end port on director B. When the VPLEX cluster has two or more engines, ensure that the host has at least one A-side path on one engine and at least one B-side on a separate engine. For maximum availability, each host can have a path to at least one front-end port on every director. ◆ Use multi-pathing software on the host servers to ensure timely response and continuous I/O in the presence of path failures. ◆ Ensure that each host has a path to each virtual volume through each fabric. ◆ Ensure that the fabric zoning provides hosts redundant access to the VPLEX front-end ports. Back end: ◆ Ensure that the logical unit number (LUN) mapping and masking for each storage volume presented from a storage array to VPLEX presents the volumes out of at least two ports from the array on at least two different fabrics from different controllers. ◆ Ensure that the LUN connects to at least two different back end ports of each director within a VPLEX cluster. ◆ Active/passive arrays must have one active and one passive port zoned to each director, and zoning must provide VPLEX with the redundant access to the array ports. ◆ Configure a maximum of four paths between a director and the LUN.

Note: On VS2 hardware, only 4 physical ports are available for back end connections on each director. Refer to the VPLEX Configuration Guide for details on the hardware configuration you are using.

I/O module failures VPLEX I/O modules serve dedicated roles. For VS2 hardware, each VPLEX director has: ◆ One front-end I/O module, one back-end I/O module, ◆ One COM I/O module used for intra- and inter-cluster connectivity. Each I/O module is a serviceable FRU. The following sections describe the behavior of the system.

Front end I/O module Failure of a front end I/O module causes all paths connected to the failed module to fail. VPLEX automatically sends a call-home.

42 EMC VPLEX Product Guide VS2 Hardware

Follow the guideline described in “Best practices: Fibre Channel ports” on page 42 to ensure that hosts have a redundant path to their data. During the removal and replacement of an I/O module, the affected director resets.

Back end I/O module Failure of a back end I/O module causes all paths connected to the failed module to fail. VPLEX automatically sends a call-home. Follow the guidelines described in “Best practices: Fibre Channel ports” on page 42 to ensure that each director has a redundant path to each storage volume through a separate I/O module. During the removal and replacement of an I/O module, the affected director resets.

COM I/O module Failure of the local COM I/O module of a director causes the director to reset and stops all service provided from the director. Follow the guidelines described in “Best practices: Fibre Channel ports” on page 42 to ensure that each host has redundant access to its virtual storage through multiple directors. During the removal and replacement of a local I/O module, the affected director resets. If best practices are followed, the reset of a single director does not cause the host to lose access to its storage.

Director failure Failure of a director causes the loss of all service from that director. The second director in the engine continues to service I/O. VPLEX clusters containing two or more engines benefit from the additional redundancy provided by the additional directors. Each director within a cluster is capable of presenting the same storage. Follow the guidelines described in “Best practices: Fibre Channel ports” on page 42 to allow a host to ride through director failures by placing redundant paths to their virtual storage through ports provided by different directors. The combination of multipathing software on the hosts and redundant paths through different directors of the VPLEX system allows the host to ride through the loss of a director. Each director is a serviceable FRU.

Intra-cluster IP In Metro and Geo configurations, VPLEX clusters are connected by a pair of private management local IP subnets between directors and the management server. These subnets: network failure ◆ Carry management traffic ◆ Protect against intra-cluster partitioning ◆ Connect the VPLEX Witness server (if it is deployed) and the directors. Failure on one of these subnets can result in the inability of some subnet members to communicate with other members on that subnet. Because the subnets are redundant, failure of one subnet results in no loss of service or manageability.

Note: Failure of a single subnet may result in loss of connectivity between this director and VPLEX Witness.

Hardware failure management and best practices 43 VS2 Hardware

Intra-cluster Fibre Dual and quad engine clusters include a pair of dedicated Fibre Channel switches for Channel switch intra-cluster communication between the directors within the cluster. failure Two redundant Fibre Channel fabrics are created. Each switch serves a different fabric. Failure of a single Fibre Channel switch results in no loss of processing or service.

Inter-cluster WAN In VPLEX Metro and VPLEX Geo configurations, the clusters are connected through links redundant WAN links that you provide.

Best practices When configuring your inter-cluster network: ◆ Latency - Latency must be less than 5 milliseconds (ms) round trip time (RTT) for a VPLEX Metro, and less than 50ms RTT for a VPLEX Geo. ◆ Link speed - Minimum of 45Mb/s of bandwidth. The required bandwidth is dependent on the I/O pattern and must be high enough for all writes to all distributed volumes to be exchanged between clusters. ◆ Uninterruptible power - Switches supporting the WAN links must be configured with a battery backup UPS. ◆ Physically independent WAN links for redundancy. ◆ Every WAN port on every director must be able to connect to a WAN port on every director in the other cluster. ◆ Logically isolate VPLEX Metro /Geo traffic from other WAN traffic using VSANs or LSANs. ◆ Use independent inter switch links (ISLs) for redundancy. ◆ Deploy VPLEX Witness in an independent failure domain

Power supply Each VPLEX cluster provides two zones of AC power. failures If one zone loses power, the modules in the cluster continue to run using power from the other zone. If both zones lose power, the engines revert to power from their SPS modules. In multi-engine clusters the management server and intra cluster Fibre Channel switches revert to the power supplied by the UPS.

SPS/UPS failures Each standby power supply (SPS) is a field replaceable unit (FRU) and can be replaced with no disruption. SPS batteries support two sequential outages of no greater than 5 minutes without data loss. The recharge time for an SPS is up to 5.5 hours. Each uninterruptible power supply (UPS) is a FRU and can be replaced with no disruption. UPS modules support two sequential outages of no greater than 5 minutes to the Fibre Channel switches in a multi-engine cluster. The recharge time for a UPS to reach 90% capacity is 6 hours.

44 EMC VPLEX Product Guide VS2 Hardware

Note: While the batteries can support two 5-minute power losses, the VPLEX Local, VPLEX Metro, or VPLEX Geo cluster vaults after a 30 second power loss to ensure enough battery power to complete the cache vault.

Power failures that Note: Vaulting is a mechanism to prevent data loss when an external environmental condition cause vault causes a cluster failure. For example, if a power failure lasts longer than 30 seconds in a VPLEX Geo configuration, then each VPLEX director copies its dirty cache data to the local solid state storage devices (SSDs). Known as vaulting, this process protects user data in cache if that data is at risk due to power loss. After each director vaults its dirty cache pages, then VPLEX shuts down the director’s firmware.

Vaulting is evolving rapidly with each release of GeoSynchrony. The events and/or conditions that trigger cache vaulting vary depending by release as follows: Release 5.0.1: ◆ Vaulting is introduced. ◆ On all configurations, vaulting is triggered if all following conditions are present: • AC power is lost (due to power failure, faulty hardware, or power supply is not present) in power zone A from engine X, • AC power is lost (due to power failure, faulty hardware, or power supply is not present) in power zone B from engine Y, (X and Y would be the same in a single engine configuration but they may or may not be the same in dual or quad engine configurations.) • Both conditions persist for more than 30 seconds. Release 5.0.1 Patch: ◆ On all configurations, vaulting is triggered if all the following conditions are present: • AC power is lost (due to power failure or faulty hardware) in power zone A from engine X, • AC power is lost (due to power failure or faulty hardware) in power zone B from engine Y, (X and Y would be the same in a single engine configuration but they may or may not be the same in dual or quad engine configurations.) • Both conditions persist for more than 30 seconds. Release 5.1: ◆ In a VPLEX Geo configuration with asynchronous consistency groups, vaulting is triggered if all the following conditions are present: • AC power is lost (due to power failure or faulty hardware) or becomes “unknown” in a director from engine X, • AC power is lost (due to power failure or faulty hardware) or becomes “unknown” in director from engine Y, (X and Y would be the same in a single engine configuration but they may or may not be the same in dual or quad engine configurations.) • Both conditions persist for more than 30 seconds. ◆ In a VPLEX Local or VPLEX Metro configuration, vaulting is triggered if all the following conditions are present:

Hardware failure management and best practices 45 VS2 Hardware

• AC power is lost (due to power failure or faulty hardware) or becomes “unknown” in the minimum number of directors required for the cluster to be operational. • Condition persist for more than 30 seconds.

Note: UPS power conditions do not trigger any vaulting.

VPLEX Witness If VPLEX Witness is deployed, failure of the VPLEX Witness has no impact on I/O as failure long as the two clusters stay connected with each other. If a cluster fails or inter-cluster network partition happens while VPLEX Witness is down, there will be data unavailability on all surviving clusters.

Best practice Best practice is to disable VPLEX Witness (while the clusters are still connected) if its outage is expected to be long, and to revert to using preconfigured detach rules. Once VPLEX Witness recovers, re-enable VPLEX Witness. Refer to the EMC VPLEX CLI Guide for information about the commands to disable and enable VPLEX Witness.

VPLEX management I/O processing of the VPLEX directors does not depend upon the management server failure servers. Thus, in most cases failure of a management server does not interrupt the I/O processing. VPLEX Witness traffic traverses the management server. If the management server fails in a configuration where VPLEX Witness is deployed, the VPLEX Witness cannot communicate with the cluster. In this scenario, failure of the remote VPLEX cluster results in data unavailability. Failure of only the inter-cluster network has no effect. The remote cluster continues I/O processing regardless of preference because it is still connected to VPLEX Witness1.

Best practice Best practice is to disable VPLEX Witness (while the clusters are still connected) if its outage is expected to be long, and to revert to using preconfigured detach rules. When the management server is replaced or repaired, use the cluster-witness enable CLI command to re-enable VPLEX Witness. Refer to the EMC VPLEX CLI Guide for information about the commands to disable and enable VPLEX Witness.

1. This description only applies to synchronous consistency groups with a rule setting that identifies a specific preference.

46 EMC VPLEX Product Guide VS2 Hardware

Component IP addresses This section details the IP addresses used to connect the components within a VPLEX cluster.

IP addresses cluster-1

Cluster IP Seed = 1 Enclosure IDs = engine numbers Engine 4: Engine 4: Director 4B, A side: 128.221.252.42 Director 4A, A side: 128.221.252.41 Director 4B, B side: 128.221.253.42 Director 4A, B side: 128.221.253.41

Engine 3: Engine 3: Director 3B, A side: 128.221.252.40 Director 3A, A side: 128.221.252.39 Director 3B, B side: 128.221.253.40 Director 3A, B side: 128.221.253.39

FC switch B 128.221.253.34 Service port Public Ethernet port 128.221.252.2 Customer-assigned FC switch A 128.221.252.34

Mgt A port Mgt B port Management server 128.221.253.33 128.221.252.33

Engine 2: Engine 2: Director 2B, A side: 128.221.252.38 Director 2A, A side: 128.221.252.37 Director 2B, B side: 128.221.253.38 Director 2A, B side: 128.221.253.37

Engine 1: Engine 1: Director 1B, A side: 128.221.252.36 Director 1A, A side: 128.221.252.35 Director 1B, B side: 128.221.253.36 Director 1A, B side: 128.221.253.35

VPLX-000242

Figure 17 Component IP addresses in cluster-1

Component IP addresses 47 VS2 Hardware

IP addresses - cluster-2

Cluster IP Seed = 2 Enclosure IDs = engine numbers Engine 4: Engine 4: Director 4B, A side: 128.221.252.74 Director 4A, A side: 128.221.252.73 Director 4B, B side: 128.221.253.74 Director 4A, B side: 128.221.253.73

Engine 3: Engine 3: Director 3B, A side: 128.221.252.72 Director 3A, A side: 128.221.252.71 Director 3B, B side: 128.221.253.72 Director 3A, B side: 128.221.253.71

FC switch B 128.221.253.66 Service port Public Ethernet port 128.221.252.2 Customer-assigned FC switch A 128.221.252.66

Mgt A port Mgt B port Management server 128.221.253.65 128.221.252.65

Engine 2: Engine 2: Director 2B, A side: 128.221.252.70 Director 2A, A side: 128.221.252.69 Director 2B, B side: 128.221.253.70 Director 2A, B side: 128.221.253.69

Engine 1: Engine 1: Director 1B, A side: 128.221.252.68 Director 1A, A side: 128.221.252.67 Director 1B, B side: 128.221.253.68 Director 1A, B side: 128.221.253.67

VPLX-000243

Figure 18 Component IP addresses in cluster-2

48 EMC VPLEX Product Guide 3 VPLEX Software

This chapter describes the major components of VPLEX software. Topics include: ◆ GeoSynchrony...... 50 ◆ Management interfaces ...... 52 ◆ Provisioning with VPLEX ...... 55 ◆ Data caching...... 59 ◆ Consistency groups...... 61 ◆ Cache vaulting ...... 65

VPLEX Software 49 VPLEX Software

GeoSynchrony GeoSynchrony is the operating system running on the VPLEX directors. GeoSynchrony runs on both VS1 and VS2 hardware. GeoSynchrony is: ◆ Designed for highly available, robust operation in geographically distributed environments, ◆ Driven by real-time I/O operations, ◆ Intelligent about locality of access, ◆ Provides the global directory that supports AccessAnywhere. Table 3 summarizes features provided by GeoSynchrony and AccessAnywhere:

Table 3 GeoSynchrony AccessAnywhere features

Feature Description and considerations

Storage volume LUNs on a back-end array can be imported into an instance of VPLEX and used while encapsulation keeping their data intact.

Considerations: The storage volume retains the existing data on the device and leverages the media protection and device characteristics of the back-end LUN.

RAID 0 VPLEX devices can be aggregated to create a RAID 0 striped device.

Considerations: Improves performance by striping I/Os across LUNs.

RAID-C VPLEX devices can be concatenated to form a new larger device.

Considerations: Provides a means of creating a larger device by combining two or more smaller devices.

RAID 1 VPLEX devices can be mirrored within a site.

Considerations: Withstands a device failure within the mirrored pair. A device rebuild is a simple copy from the remaining device to the newly repaired device. Rebuilds are done in incremental fashion, whenever possible. The number of required devices is twice the amount required to store data (actual storage capacity of a mirrored array is 50%). The RAID 1 devices can come from different back-end array LUNs providing the ability to tolerate the failure of a back-end array.

Distributed RAID 1 VPLEX devices can be mirrored between sites.

Considerations: Provides protection from site disasters and supports the ability to move data between geographically separate locations.

Extents Storage volumes can be broken into extents and devices created from these extents.

Considerations: Use when LUNs from a back-end storage array are larger than the desired LUN size for a host. This provides a convenient means of allocating what is needed while taking advantage of the dynamic thin allocation capabilities of the back-end array.

50 EMC VPLEX Product Guide VPLEX Software

Table 3 GeoSynchrony AccessAnywhere features

Feature Description and considerations

Migration Volumes can be migrated non-disruptively to other storage systems.

Considerations: Use for changing the quality of service of a volume or for performing technology refresh operations.

Global Visibility The presentation of a volume from one VPLEX cluster where the physical storage for the volume is provided by a remote VPLEX cluster.

Considerations: Use for AccessAnywhere collaboration between locations. The cluster without local storage for the volume will use its local cache to service I/O but non-cached operations incur remote latencies to write or read the data.

GeoSynchrony 51 VPLEX Software

Management interfaces VPLEX includes a selection of management interfaces: ◆ “Web-based GUI” ◆ “VPLEX CLI” ◆ “VPLEX Element Manager API” ◆ “SNMP” ◆ “LDAP/AD” ◆ “Call-home” In VPLEX Metro and VPLEX Geo configurations, both clusters can be managed from either management server. Inside VPLEX clusters, management traffic traverses a TCP/IP-based private management network. In VPLEX Metro and VPLEX Geo configurations, management traffic traverses a VPN tunnel between the management servers on both clusters.

Web-based GUI VPLEX’s web-based graphical user interface (GUI) provides an easy-to-use point-and-click management interface. Figure 19 shows the screen to claim storage:

Figure 19 Claim storage using the GUI

The GUI supports most VPLEX operations, and includes EMC Unisphere for VPLEX online help to assist new users in learning the interface. VPLEX operations that are not available in the GUI, are supported by the CLI, which supports full functionality.

VPLEX CLI The VPLEX command line interface (CLI) supports all VPLEX operations. The CLI is divided into command contexts:

52 EMC VPLEX Product Guide VPLEX Software

◆ Global commands are accessible from all contexts. ◆ Other commands are arranged in a hierarchical context tree, and can be executed only from the appropriate location in the context tree. Example 1 shows a CLI session that performs the same tasks as shown in Figure 19.

Example 1 Claim storage using the CLI:

In the following example, the claimingwizard command finds unclaimed storage volumes, claims them as thin storage, and assigns names from a CLARiiON hints file: VPlexcli:/clusters/cluster-1/storage-elements/storage-volumes> claimingwizard --file /home/service/clar.txt --thin-rebuild Found unclaimed storage-volume VPD83T3:6006016091c50e004f57534d0c17e011 vendor DGC: claiming and naming clar_LUN82. Found unclaimed storage-volume VPD83T3:6006016091c50e005157534d0c17e011 vendor DGC: claiming and naming clar_LUN84. Claimed 2 storage-volumes in storage array car Claimed 2 storage-volumes in total. VPlexcli:/clusters/cluster-1/storage-elements/storage-volumes>

The EMC VPLEX CLI Guide provides a comprehensive list of VPLEX commands and detailed instructions on using those commands.

VPLEX Element VPLEX Element Manager API uses the Representational State Transfer (REST) Manager API software architecture for distributed systems such as the World Wide Web. It allows software developers and other users to use the API to create scripts to run VPLEX CLI commands. VPLEX Element Manager API supports all VPLEX CLI commands that can be executed from the root context.

SNMP The VPLEX SNMP agent: ◆ Supports retrieval of performance-related statistics as published in the VPLEX-MIB.mib. ◆ Runs on the management server and fetches performance related data from individual directors using a firmware specific interface. ◆ Provides SNMP MIB data for directors (local cluster only). ◆ Runs on Port 161 of the management server and uses the UDP protocol. ◆ Supports the following SNMP commands: • SNMP Get • SNMP Get Next • SNMP get Bulk VPLEX MIBs are located on the management server in the /opt/emc/VPlex/mibs directory.

LDAP/AD VPLEX administrators can choose to configure their user accounts using either:

Management interfaces 53 VPLEX Software

◆ An external OpenLDAP or Active Directory server (which integrates with Unix using Service for UNIX 3.5 or Identity Management for UNIX or other authentication service). OpenLDAP and Active Directory users are authenticated by the server. Usernames/passwords created on an external server are fetched from the remote system onto the VPLEX system the first time they are used. They are stored on the VPLEX system after the first use. ◆ The VPLEX management server Usernames/passwords are created locally on VPLEX system, and are stored on VPLEX. Customers who do not want to use an external LDAP server for maintaining user accounts can create user accounts on the VPLEX system itself.

Call-home Call-home is a leading technology that alerts EMC support personnel of warnings in VPLEX. When a fault is detected in a VPLEX configuration, VPLEX automatically sends a call-home notification to EMC and optionally to your data center support personnel. Call-home notifications enable EMC to pro-actively engage the relevant personnel, or use a configured ESRS gateway to resolve the problem. If the same event occurs repeatedly on the same component, a call-home is generated only for the first instance of the event, and not again for 8 hours. You can customize the recipients, the severity and the text of call-home events to meet your specific requirements.

54 EMC VPLEX Product Guide VPLEX Software

Provisioning with VPLEX VPLEX allows easy storage provisioning among heterogeneous storage arrays. Use the web-based GUI to simplify everyday provisioning or create complex devices. Figure 20 shows the main Provisioning page in the GUI:

Figure 20 VPLEX GUI Provisioning - main page

Table 4 describes the highlights of EZ-Provisioning and Advanced Provisioning.

Table 4 Web-based provisioning methods

EZ provisioning EZ provisioning simplifies the steps to: • Claim storage, create extents and devices, and the virtual volumes on those devices. • Register initiators that will access VPLEX storage. • Create storage views that include virtual volumes, initiators, and VPLEX ports to control host access to the virtual volumes. EZ provisioning uses the entire capacity of the selected storage volume to create a device, and then creates a virtual volume on top of the device.

Advanced provisioning Advanced provisioning allows you to slice storage volumes into extents. Extents are then available to create devices, and then virtual volumes on these devices. Advanced provisioning provides granular control of the steps to: • View and claim available storage • Create extents from storage volumes • Create RAID-0, RAID-1, RAID-C or 1:1 mapping of extents to devices • Create virtual volumes on those devices • Register initiators that will access VPLEX storage • Create storage views that include virtual volumes, initiators, and VPLEX ports to control host access to the virtual volumes Use advanced provisioning to create complex devices.

After a storage array LUN volume is encapsulated within VPLEX, all of its block-level storage is available in a global directory and coherent cache. Any front-end device that is zoned properly can access the storage blocks.

Provisioning with VPLEX 55 VPLEX Software

Thick and thin Traditional (thick) provisioning anticipates future growth and thus allocates storage storage volumes capacity beyond the immediate requirement. Traditional rebuilds copy all the data from the source to the target. Thin provisioning allocates storage capacity only as the application needs it — when it writes. Thinly provisioned volumes: ◆ Expand dynamically depending on the amount of data written to them ◆ Do not consume physical space until written to. Thin provisioning optimizes the efficiency with which available storage space is used. By default, VPLEX treats all storage volumes as if they were thickly provisioned. You can tell VPLEX to claim arrays that are thinly provisioned using the thin-rebuild attribute. If a target is thinly provisioned, VPLEX reads the storage volume, and does not write unallocated blocks to the target, preserving the target’s thin provisioning.

About extents An extent is a portion of a disk. With VPLEX, you can create an extent that uses the entire capacity of the underlying storage volume, or just a portion of the volume.

Virtual volume

Top-level device

Devices

Extents

Storage volumes

Figure 21 Extents

You can create up to 128 extents per storage volume. Extents provide a convenient means of allocating what is needed while taking advantage of the dynamic thin allocation capabilities of the back-end array. Create custom-sized extents when LUNs from a back-end storage array are larger than the desired LUN size for a host. Combine one or more extents into devices.

56 EMC VPLEX Product Guide VPLEX Software

About devices Devices combine extents or other devices into a device with specific RAID techniques such as mirroring or striping.

Virtual volume

Top-level device

Devices

Extents

Storage volumes

Figure 22 Devices

A simple device has only one underlying component - an extent. A complex device has more than one component, combined by using a specific RAID technique. The components can be extents or other devices (both simple and complex). A top-level device consists of one or more “child” devices. VPLEX supports the following RAID types: ◆ RAID-0 - Stripes data across two or more storage volumes. RAID-0 devices provide better performance since data is retrieved from several storage volumes at the same time. RAID-0 devices do not include a mirror to provide data redundancy. Use RAID-0 for non-critical data that requires high speed and low cost of implementation. ◆ RAID-1 - Mirrors data using at least two devices to duplicate the data. RAID-1 does not stripe. RAID-1 improves read performance because either extent can be read at the same time. Use RAID-1 for applications that require high fault tolerance, without heavy emphasis on performance. ◆ RAID-C - Appends (concatenates) extents or devices into one larger device. A device's storage capacity is not available until you create a virtual volume on the device and export that virtual volume to a host. You can create only one virtual volume per device.

Device visibility Visibility determines which clusters know about a device. A device can be visible only to the local cluster (local visibility) or to both clusters (global visibility). All distributed devices have global visibility. You can create a virtual volume from extents at one cluster and make them available for I/O (visible) at the other cluster.

Provisioning with VPLEX 57 VPLEX Software

A virtual volume on a top-level device that has global visibility (or is a member of a consistency group with global visibility) can be exported in storage views on either cluster. More about consistency group in “Consistency groups” on page 61.

Distributed devices Distributed devices have their underlying storage arrays located at both clusters in a VPLEX Metro or VPLEX Geo:

Site A Site B

Distributed Virtual Volume

Fibre Channel

VPLX-000433

Figure 23 Distributed devices

Distributed devices support virtual volumes, that are presented to a host through a storage view. From the host, the virtual volumes appear as single volumes located on a single array. Distributed devices are present at both clusters for simultaneous active/active read/write access. VPLEX AccessAnywhere ensures consistency of the data between the clusters. VPLEX distributed devices enable distributed data centers. Some of the benefits of implementing a distributed data center include: ◆ Increased availability - Both data centers can serve as production workloads while providing high availability for the other data center. ◆ Increased asset utilization - Passive data centers can have idle resources. ◆ Increased performance/locality of data access - Data does not have to be read from the production site as the same data is read/write accessible at both sites. You can configure up to 8000 distributed devices in a VPLEX system. That is, the total number of distributed virtual volumes plus the number of top-level local devices must not exceed 8000.

Mirroring Mirroring writes data to two or more disks simultaneously. If one leg of a mirrored devices fails, mirroring protects data by automatically leveraging the other disks without losing data or service.

58 EMC VPLEX Product Guide VPLEX Software

VPLEX mirrors transparently protect applications from back-end storage array failure and maintenance operations. RAID-1 data is mirrored using at least two extents to duplicate the data. Read performance is improved because either extent can be read at the same time. VPLEX manages mirroring between heterogenous storage arrays for both local and distributed mirroring. Local mirroring Local mirroring (mirroring on VPLEX Local systems) protects RAID-1 virtual volumes within a data center. VPLEX RAID-1 devices provide a local full-copy RAID 1 mirror of a device independent of the host and operating system, application, and database. Remote mirroring Distributed mirroring (VPLEX Metro and VPLEX Geo) protects distributed virtual volumes by mirroring it between the two VPLEX clusters.

About virtual A virtual volume is created on a device or a distributed device, and is presented to a volumes host through a storage view. Virtual volumes are created on top-level devices only, and always use the full capacity of the underlying device or distributed device.

Virtual volume

Top-level device

Devices

Extents

Storage volumes

Figure 24 Virtual volumes

If needed, you can expand the capacity of a virtual volume. The expansion method supported are listed as follows: ◆ Storage-volume ◆ Concatenation For more details on volume expansion, see the Administration Guide.

Data caching VPLEX uses EMC’s advanced data caching to improve I/O performance and reduce storage array contention. The type of data caching used for distributed volumes depends on the VPLEX configuration. ◆ VPLEX Local and Metro configurations have round-trip latencies of 5 ms or less, and use write-through caching. ◆ VPLEX Geo configurations support round-trip latencies of greater than 5 ms, and use write-back caching.

Provisioning with VPLEX 59 VPLEX Software

Write-through caching In write-through caching, a director writes to back-end storage in both clusters before acknowledging the write to the host. Write-through caching maintains a real-time synchronized mirror of a virtual volume between the two clusters of the VPLEX system providing a recovery point objective (RPO) of zero data loss and concurrent access to the volume through either cluster. In the VPLEX user interface, write-through caching is known as synchronous cache mode. Write-back caching In write-back caching, a director stores the data in its cache and also protects it at another director in the local cluster before acknowledging the write to the host. At a later time, the data is written to back end storage. Write-back caching provides a RPO that could be as short as a few seconds. This type of caching is performed on VPLEX Geo configurations, where the latency is greater than 5 ms. In the VPLEX user interface, write-back caching is known as asynchronous cache mode.

Logging volumes Logging volumes keep track of blocks written during: ◆ An inter-cluster link outage, or ◆ When one leg of a distributed device becomes temporarily unreachable (is unreachable, but not permanently removed from the SAN). After the inter-cluster link or leg is restored, the VPLEX system uses the information in logging volumes to synchronize the mirrors by sending only changed blocks across the link. Logging volumes also track changes during loss of a volume when that volume is one mirror in a distributed device. For VPLEX Metro and Geo configurations, logging volumes are required at each cluster before a distributed device can be created. VPLEX Local configurations, and systems that do not have distributed devices do not require logging volumes.

Back-end load VPLEX uses all paths to a LUN in a round-robin fashion thus balancing the load balancing across all paths. Slower storage hardware can be dedicated for less frequently accessed data and optimized hardware can be dedicated to applications that require the highest storage response.

60 EMC VPLEX Product Guide VPLEX Software

Consistency groups VPLEX consistency groups aggregate volumes to enable the application of a common set of properties to the entire group. Consistency groups aggregate up to 1000 virtual volumes into a single entity that you can manage as easily as an individual volume. If all storage for an application with rollback capabilities is in a single consistency group, the application can recover from a complete cluster failure or inter-cluster link failure with little or no data loss. Data loss, if any, is determined by the application’s data access pattern and the consistency group’s cache-mode. All consistency groups guarantee a crash-consistent image of their member virtual-volumes. In the event of a director, cluster, or inter-cluster link failure, consistency groups prevent possible data corruption. There are two types of consistency groups: ◆ “Synchronous consistency groups” Synchronous consistency groups aggregate local and distributed volumes on VPLEX Local and VPLEX Metro systems separated by 5ms or less of latency. ◆ “Asynchronous consistency groups” Asynchronous consistency groups aggregate distributed volumes on VPLEX Geo systems separated by 50ms or less of latency.

Synchronous Synchronous consistency groups provide a convenient way to apply rule sets and consistency groups other properties to a group of volumes in a VPLEX Local or VPLEX Metro system. Synchronous consistency groups simplify configuration and administration on large systems. VPLEX supports up to 1024 synchronous consistency groups. Synchronous consistency groups: ◆ Contain up to 1000 virtual volumes. ◆ Contain either local or distributed volumes, (but not a mixture of both). ◆ Contain volumes with either global or local visibility. ◆ Use write-through caching

Synchronous Synchronous consistency groups support either local or distributed volumes. consistency groups: visibility Local synchronous consistency groups can have the visibility property set to either: ◆ “Local visibility”- The local volumes in the consistency group are visible only to local cluster. ◆ “Global visibility”- The local volumes in the consistency group have storage at one cluster, but are visible to both clusters. Local visibility Local consistency groups with the visibility property set to only the local cluster read and write only to their local cluster.

Consistency groups 61 VPLEX Software

Figure 25 shows a local consistency group with local visibility.

Cluster - 1

1 wA4

Virtual Volume

2 wA3

VPLX-000375

Figure 25 Local consistency groups with local visibility

Global visibility Global visibility allows both clusters to receive I/O from the cluster that does not have a local copy. Any reads that cannot be serviced from local cache are also transferred across the link. This allows the remote cluster to have instant on-demand access to the consistency group. Figure 26 shows a local consistency group with global visibility.

Cluster - 1 Cluster - 2

1 wA6

2 w Virtual Volume 5 A

3 wA4

Storage

VPLX-000372

Figure 26 Local consistency group with global visibility

62 EMC VPLEX Product Guide VPLEX Software

Asynchronous Asynchronous consistency groups provide a convenient way to apply rule sets and other consistency groups properties to distributed volumes in a VPLEX Geo. VPLEX supports up to 16 asynchronous consistency groups. Asynchronous consistency groups: ◆ Contain up to 1000 virtual volumes. ◆ Contain only distributed volumes. ◆ Contain volumes with either global or local visibility. ◆ Use write-back caching In asynchronous cache mode, write order fidelity is maintained by batching I/O between clusters into packages called deltas that are exchanged between clusters. Each delta contains a group of writes that were initiated in the same window of time. Each asynchronous consistency group maintains its own queue of deltas. Before a delta is exchanged between clusters, data within the delta can vary by cluster. After a delta is exchanged and committed, data is the same on both clusters. If access to the back end array is lost while the system is writing a delta, the data on disk is no longer consistent and requires automatic recovery when access is restored. Asynchronous cache mode can give better performance, but there is a higher risk that data will be lost if: ◆ Multiple directors fail at the same time ◆ There is an inter-cluster link partition and both clusters are actively writing and instead of waiting for the link to be restored, the user chooses to accept a data rollback in order to reduce the recovery time objective (RTO) ◆ The cluster that is actively writing fails

Asynchronous In VPLEX Geo systems, the configuration of asynchronous consistency groups consistency groups: changes dynamically between active/passive and active/active depending on the active vs. passive write activity at each cluster. A cluster is active if it has data in cache that is yet to be written to the back-end storage (also known as dirty data). A cluster is passive if it has no data in cache that has not been written to the back-end. A cluster is marked as passive if the next delta exchange completes without any new data from that cluster. The cluster remains passive as long as it contributes no data. A Geo configuration is active/passive if applications write to only one cluster. A Geo configuration is active/active if applications write to both clusters.

Detach rules Most I/O workloads require specific sets of virtual volumes to resume on one cluster and remain suspended on the other cluster during failures. VPLEX includes two levels of “detach rules” that determine which cluster continues I/O during an inter-cluster link failure or cluster failure. ◆ device-level detach rules determine which cluster continues for an individual device. ◆ consistency group-level detach rules determine which cluster continues for all the member volumes of a consistency group. There are three consistency group detach rules:

Consistency groups 63 VPLEX Software

◆ no-automatic-winner - The consistency group does not select a winning cluster. ◆ active-cluster-wins - Applicable only to asynchronous consistency groups. If one cluster was active and one was passive at the time of the outage, the consistency group selects the active cluster as the winner. ◆ winner cluster-name delay seconds - Applicable only to synchronous consistency groups. The cluster specified by cluster-name is declared the winner if an inter-cluster link outage lasts more than the number of seconds specified by delay. If a consistency group has a detach-rule configured, the rule applies to all volumes in the consistency group, and overrides any rule-sets applied to individual volumes. In the event of connectivity loss with the remote cluster, the detach rule defined for each consistency group identifies a preferred cluster (if there is one) that can resume I/O to the volumes in the consistency group. In a VPLEX Metro configuration, I/O proceeds on the preferred cluster and is suspended on the non-preferred cluster. In a VPLEX Geo configuration, I/O proceeds on the active cluster only when the remote cluster has no dirty data in cache.

Asynchronous The active-cluster-wins detach rule is applicable only to asynchronous consistency consistency groups: groups. During a failure, I/O continues at the cluster where the application was active-cluster-wins actively writing last (provided there was only one such cluster). The active cluster is detach rule the preferred cluster. If both clusters were active during the failure, I/O must suspend at both clusters. I/O suspends because the cache image is inconsistent on both clusters and must be rolled back to a point where both clusters had a consistent image to continue I/O. Application restart is required after roll back. If both clusters were passive and have no dirty data at the time of the failure, the cluster that was the active cluster proceeds with I/O after failure. Regardless of the detach rules in asynchronous consistency groups, as long as the remote cluster has dirty data, the local cluster suspends I/O if it observes loss of connectivity with the remote cluster regardless of preference. This is done to allow the administrator to stop or restart the application prior to exposing the application to the rolled back time consistent data image (if roll-back is required).

Note: VPLEX Witness does not guide failover behavior for asynchronous consistency groups. In Geo configurations (asynchronous consistency groups), VPLEX Witness observations can be used to diagnose problems, but not to automate failover.

64 EMC VPLEX Product Guide VPLEX Software

Cache vaulting VPLEX uses the individual director's memory systems to ensure durability of user data and critical system configuration data. If a power failure on a VPLEX Geo cluster (using write-back cache mode) occurs, then the data in cache memory might be at risk. Each VPLEX director copies its dirty cache data to the local solid state storage devices (SSDs) using a process known as cache vaulting. Dirty cache pages are pages in a director's memory that have not been written to back-end storage but were acknowledged to the host. Dirty cache pages include the copies protected on a second director in the cluster. These pages must be preserved in the presence of power outages to avoid loss of data already acknowledged to the host. After each director vaults its dirty cache pages, VPLEX shuts down the director’s firmware.

Note: Although there is no dirty cache data in VPLEX Local or VPLEX Metro configurations, vaulting quiesces all I/O when data is at risk due to power failure. This minimizes risk of metadata corruption.

When power is restored, VPLEX initializes the hardware and the environmental system, checks the data validity of each vault, and unvaults the data. In VPLEX Geo configurations, cache vaulting is necessary to safeguard the dirty cache data under emergency conditions. Vaulting can be used in two scenarios: ◆ Data at risk due to power failure: VPLEX monitors all components that provide power to the VPLEX cluster. If it detects AC power loss that would put data at risk, it takes a conservative approach and initiates a cluster wide vault if the power loss exceeds 30 seconds.

Note: Power failure of the UPS (in dual and quad engine configurations) does not currently trigger any vaulting actions on power failure.

◆ Manual emergency cluster shutdown: When unforseen circumstances require an unplanned and immediate shutdown, it is known as an emergency cluster shutdown. You can use a CLI command to manually start vaulting if an emergency shutdown is required.

WARNING When performing maintenance activities on a VPLEX Geo system, service personnel must not remove the power in one or more engines that would result in the power loss of both directors unless both directors in those engines have been shutdown and are no longer monitoring power. Failure to do so, will lead to data unavailability in the affected cluster. To avoid unintended vaults, always follow official maintenance procedures.

For information on distributed cache protection, refer to “Global cache” on page 83 For information on conditions that cause a vault see “Power failures that cause vault” on page 45.

Cache vaulting 65 VPLEX Software

66 EMC VPLEX Product Guide 4

Integrity and Resiliency

This chapter describes how VPLEX’s high availability and redundancy features provide robust system integrity and resiliency. Topics include: ◆ About VPLEX resilience and integrity ...... 68 ◆ Cluster...... 69 ◆ Path redundancy ...... 70 ◆ High Availability with VPLEX Witness ...... 74 ◆ ALUA...... 81 ◆ Additional resilience features...... 82 ◆ Performance monitoring and security features ...... 85 ◆ Security features ...... 88

Integrity and Resiliency 67 Integrity and Resiliency

About VPLEX resilience and integrity With VPLEX, you get true high availability. Operations continue and data remains online even if a failure occurs. Within synchronous distances (VPLEX Metro), think of VPLEX as providing disaster avoidance instead of just disaster recovery. VPLEX Metro and Geo give you shared data access between sites. The same data (not a copy but the same data), exists at more than one location simultaneously. VPLEX can withstand a component failure, a site failure, or loss of communication between sites and still keep the application and data online and available. VPLEX clusters are capable of surviving any single hardware failure in any subsystem within the overall storage cluster, including host connectivity and memory subsystems. A single failure in any subsystem does not affect the availability or integrity of the data. VPLEX redundancy creates fault tolerance for devices and hardware components that continue operation as long as one device or component survives. This highly available and robust architecture can sustain multiple device and component failures without disrupting service to I/O. Failures and events that do not disrupt I/O include: ◆ Unplanned and planned storage outages ◆ SAN outages ◆ VPLEX component failures ◆ VPLEX cluster failures ◆ Data center outages To achieve high availability, you must create redundant host connections and supply hosts with multi path drivers.

Note: In the event of a front-end port failure or a director failure, hosts without redundant physical connectivity to a VPLEX cluster and without multipathing software installed could be susceptible to data unavailability.

68 EMC VPLEX Product Guide Integrity and Resiliency

Cluster VPLEX is a true cluster architecture. That is, all devices are always available and I/O that enters the cluster from anywhere can be serviced by any node within the cluster, while cache and coherency is maintained for all reads and writes. As you add more devices to the cluster, you get the added benefits of more cache, increased processing power, and more performance. You also get N–1 fault tolerance, which means any device failure, or any component failure can be sustained, and the cluster will continue to operate as long as one device survives. This is a very highly available and robust architecture, capable of sustaining even multiple failures while it still continues to provide virtualization and storage services. A VPLEX cluster (either VS1 or VS2) consists of redundant hardware components. If one director in an engine fails, the second director in the engine continues to service I/O. All hardware resources (CPU cycles, I/O ports, and cache memory) are pooled. Two-cluster configurations (Metro and Geo) offer true high availability. Operations continue and data remains online even if an entire site fails. VPLEX Metro configurations provide a high availability solution with zero recovery point objective (RPO). VPLEX Geo configurations enable near-zero RPO and the failover is still automated.

Cluster 69 Integrity and Resiliency

Path redundancy Path redundancy is critical for high availability. This section describes how VPLEX delivers resilience using multiple paths, including path redundancy through: ◆ “Different ports” on page 70 ◆ “Different directors” on page 70 ◆ “Different engines” on page 72 ◆ “Site distribution” on page 72

Different ports Front-end ports on all directors can provide access to any virtual volume in the cluster. Include multiple front end ports in each storage view to protect against port failures. When a director port fails, the host multipathing software seamlessly fails over to another path through a different port, as shown in Figure 27.

Director A1 Director B1

Engine 1

Virtual Volume

VPLX-000376

Figure 27 Path redundancy: different ports

Combine multi-pathing software plus redundant volume presentation for continuous data availability in the presence of port failures. Back-end ports, local COM ports and WAN COM ports provide similar redundancy for additional resilience.

Different directors Each VPLEX engine includes redundant directors. Each director can service I/O for any other director in the cluster due to the redundant nature of the global directory and cache coherency. If one director in the engine fails, then the second director immediately takes over the I/O processing from the host.

70 EMC VPLEX Product Guide Integrity and Resiliency

In Figure 28, Director A has failed, but Director B services the host I/O previously serviced by Director A.

Director A1 Director B1

Engine 1

Virtual Volume

VPLX-000392

Figure 28 Path redundancy: different directors

Combine multi-pathing software with VPLEX’s volume presentation on different directors for continuous data availability in the presence of director failures.

Note: If a director loses access to a specific storage volume, but other directors at the same cluster have access to that volume, VPLEX can forward back end I/O to another director that still has access. This condition is known as asymmetric back end visibility. When asymmetric back end visibility happens, VPLEX is considered in a degraded state, and cannot provide high availability. Operations such as NDU are prevented. Asymmetric back end visibility can also have a performance impact.

Best practices For maximum availability: ◆ Present virtual volumes through each director so that all directors but one can fail without causing data loss or unavailability. ◆ Connect all directors to all storage. ◆ Configure paths through both an A director and a B director to ensure continuous I/O during non-disruptive upgrade of VPLEX. ◆ Connect VPLEX directors to both Fibre Channel fabrics (if used) for the front-end (host-side), and the back-end (storage array side). Isolate the fabrics. Redundant connections from the directors to the fabrics and fabric isolation allows VPLEX to ride through failures of an entire fabric with no disruption of service. ◆ Connect hosts to both fabrics and use multi-pathing software to ensure continuous data access during failures. ◆ Connect I/O modules to redundant fabrics.

Path redundancy 71 Integrity and Resiliency

FAB-B FAB-A

VPLX-000432

Figure 29 Recommended fabric assignments for front-end and back-end ports

Different engines In a dual- or quad-engine configuration, if one engine goes down, another engine completes the host I/O processing, as shown in Figure 30.

Director A1 Director B1 Director A1 Director B1

Engine 1 Engine 2

Virtual Volume

VPLX-000393

Figure 30 Path redundancy: different engines

In VPLEX Geo, directors in the same engine serve as protection targets for each other. If a single director in an engine goes down, the remaining director uses another director in the cluster as its protection pair. Simultaneously losing an engine in an active cluster, though very rare, could result in loss of crash consistency. However, the loss of 2 directors in different engines can be handled as long as other directors can serve as protection targets for the failed director. Multi-pathing software plus volume presentation on different engines yields continuous data availability in the presence of engine failures on VPLEX Metro.

Site distribution When two VPLEX clusters are connected together with VPLEX Metro or Geo, VPLEX gives you shared data access between sites. VPLEX can withstand a component failure, a site failure, or loss of communication between sites and still keep the application and data online and available. VPLEX Metro ensures that if a data center goes down, or even if the link to that data center goes down, the other site can continue processing the host I/O.

72 EMC VPLEX Product Guide Integrity and Resiliency

In Figure 31, despite a site failure at Data Center B, I/O continues without disruption in Data Center A.

Data center A Data center B

Cluster file system

Director A1 Director B1 Director A1 Director B1

Engine 1 Engine 1

Virtual Volume Virtual Volume

VPLX-000394

Figure 31 Path redundancy: different sites

Install the optional VPLEX Witness on a server in a separate failure domain to provide further fault tolerance in VPLEX Metro configurations. See “High Availability with VPLEX Witness” on page 74.

Path redundancy 73 Integrity and Resiliency

High Availability with VPLEX Witness VPLEX Witness helps multi-cluster VPLEX configurations automate the response to cluster failures and inter-cluster link outages. VPLEX Witness is an optional component installed as a virtual machine on a customer host.

Note: The customer host must be deployed in a separate failure domain from either VPLEX cluster to eliminate the possibility of a single fault affecting both a cluster and VPLEX Witness.

Note: The VPLEX Witness server supports round trip time latency of 1 second over the management IP network.

In Metro and Geo configurations, VPLEX uses rule sets to define how failures are handled. If the clusters lose contact with one another or if one cluster fails, rule sets define which cluster continues operation (the preferred cluster) and which suspends I/O (the non-preferred cluster). This works for most link or cluster failures. In the case where the preferred cluster fails, all I/O is suspended resulting in data unavailability. VPLEX Witness observes the state of the clusters, and thus can distinguish between a outage of the inter-cluster link and a cluster failure. VPLEX Witness uses this information to guide the clusters to either resume or suspend I/O. VPLEX Witness works in conjunction with consistency groups. VPLEX Witness guidance does not apply to local volumes and distributed volumes that are not members of a consistency group. VPLEX Witness capabilities vary depending on whether the VPLEX is a Metro (synchronous consistency groups) or Geo (asynchronous consistency groups). ◆ In Metro systems, VPLEX Witness provides seamless zero recovery time objective (RTO) fail-over for storage volumes in synchronous consistency groups. Combine VPLEX Witness and VPLEX Metro to provide the following features: • High availability for applications in a VPLEX Metro configuration leveraging synchronous consistency groups (no single points of storage failure), • Fully automatic failure handling of synchronous consistency groups in a VPLEX Metro configuration (provided these consistency groups are configured with a specific preference), • Better resource utilization. ◆ In Geo systems, VPLEX Witness automates fail-over for asynchronous consistency groups and provides zero RTO and zero RPO fail-over in all cases that do not result in data rollback. Figure 32 on page 75 shows a high level architecture of VPLEX Witness. The VPLEX Witness server must reside in a failure domain separate from cluster-1 and cluster-2.

74 EMC VPLEX Product Guide Integrity and Resiliency

Failure Domain #3

VPLEX Witness

Cluster 1 IP management Cluster 2 Network

Inter-cluster Network A

Inter-cluster Network B

Failure Domain #1 Failure Domain #2

VPLX-000474

Figure 32 High level VPLEX Witness architecture

The VPLEX Witness server must be deployed in a separate failure domain to both of the VPLEX clusters. This deployment enables VPLEX Witness to distinguish between a site outage and a link and to provide the correct guidance.

Witness installation It is important to deploy the VPLEX Witness server VM in a separate failure domain considerations than either cluster. A failure domain is a set of entities effected by the same set of faults. The scope of the failure domain depends on the set of fault scenarios that can be tolerated in a given environment. For example: ◆ If the two clusters are deployed on different floors of the same data center, deploy the VPLEX Witness Server VM on a separate floor. ◆ If the two clusters are deployed in two different data centers, deploy the VPLEX Witness Server VM in the third data center. VPLEX Witness is deployed: ◆ As a virtual machine running on a customer’s VMware ESX server ◆ In a failure domain separate from either of the VPLEX clusters, and ◆ Protected by a firewall. The VPLEX Witness software includes a client on each of the VPLEX clusters. VPLEX Witness does not appear in the CLI until the client has been configured.

Failures in Metro VPLEX Metro configurations (synchronous consistency groups) have two consistency systems: without group level detach rules: VPLEX Witness ◆ winner cluster-name delay seconds

High Availability with VPLEX Witness 75 Integrity and Resiliency

◆ no-automatic-winner VPLEX Witness does not guide consistency groups with the no-automatic-winner detach rule. The remainder of this discussion applies only to synchronous consistency groups with the winner cluster-name delay seconds detach rule. Synchronous consistency groups use write-through caching. Host writes to a distributed volume are acknowledged back to the host only after the data is written to the back-end storage at both VPLEX clusters. In Figure 33, the winner cluster-name delay seconds detach rule designates cluster-1 as the preferred cluster. That is, during an inter-cluster link outage or a cluster failure, I/O to the device leg at cluster-1 continues, and I/O to the device leg at cluster-2 is suspended. Three common types of failures that illustrate how VPLEX responds without Cluster Witness are described below:.

Scenario 1 Scenario 2 Scenario 3

Cluster 1 Cluster 2 Cluster 1 Cluster 2 Cluster 1 Cluster 2

VPLX-000434

Figure 33 Failure scenarios in VPLEX Metro configurations without VPLEX Witness

Scenario 1 - Inter-cluster link outage. Both of the dual links between the clusters have an outage. Also known as a cluster partition. • The preferred cluster (cluster-1) continues I/O • Cluster-2 suspends I/O The existing detach rules are sufficient to prevent data unavailability. Writes at cluster-1 are logged. When the inter-cluster link is restored, a log rebuild copies only the logged changes to resynchronize the clusters. Scenario 2 - Cluster-2 fails. • Cluster-1 (the preferred cluster) continues I/O. The existing detach rules are sufficient to prevent data unavailability. Volumes are accessible with no disruptions at cluster-1. Writes at cluster-1 are logged. When cluster-2 is restored, and rejoins cluster-1, a log rebuild copies only the logged changes to resynchronize cluster-2. Scenario 3 - Cluster-1 (the preferred cluster) fails. • Cluster-2 suspends I/O (data unavailability) VPLEX cannot automatically recover from this failure and suspends I/O at the only operating cluster. When the failure is repaired, recovery may require manual intervention to re-enable I/O on cluster-2.

76 EMC VPLEX Product Guide Integrity and Resiliency

VPLEX Witness addresses Scenario 3, where the preferred cluster fails, and the un-preferred cluster cannot continue I/O due to the configured detach rule-set.

Note: VPLEX Witness has no impact on distributed volumes in synchronous consistency groups configured with the no-automatic-winner rule. In that case, manual intervention is required in the presence of any failure scenario described above.

Failures in Metro When VPLEX Witness is deployed in a VPLEX Metro configuration, failure of the systems: with VPLEX preferred cluster (Scenario 3) does not result in data unavailability for distributed Witness devices that are members of (synchronous) consistency groups. Instead, VPLEX Witness guides the surviving cluster to continue I/O, despite it’s designation as the non-preferred cluster. I/O continues to all distributed volumes in all synchronous consistency groups that do not have the no-automatic-winner detach rule. Host applications continue I/O on the surviving cluster without any manual intervention. When the preferred cluster fails in a Metro configuration, VPLEX Witness provides seamless zero RTO failover to the surviving cluster.

Witness and network The VPLEX Witness Server VM connects to the VPLEX clusters over the management failures IP network. The deployment of VPLEX Witness adds a point of failure to the VPLEX deployment. This section describes the impact of failures of the VPLEX Witness Server VM and the network connections between the VM and the clusters.

Note: This discussion applies only to VPLEX Witness in VPLEX Metro configurations.

Failures of the connections between cluster and the VPLEX Witness VM are managed as follows:

High Availability with VPLEX Witness 77 Integrity and Resiliency

Local Cluster Isolation Remote Cluster Isolation

VPLEX VPLEX Witness Witness I/O Suspends I/O Continues

I/O Continues I/O Suspends

Cluster 1 Cluster 2 Cluster 1 Cluster 2

Inter Cluster Partition Loss of Contact with VPLEX Witness

VPLEX VPLEX Witness Witness I/O Continues I/O Continues

I/O Suspends I/O Continues

Cluster 1 Cluster 2 Cluster 1 Cluster 2

VPLX-000435

Figure 34 VPLEX Witness and VPLEX cluster connection failures

Local Cluster Isolation - The preferred cluster loses contact with both the remote cluster and VPLEX Witness. • The preferred cluster cannot receive guidance from VPLEX Witness and suspends I/O. • VPLEX Witness guides the non-preferred cluster to continue I/O. Remote Cluster Isolation - The preferred cluster loses contact with the remote cluster and the non-preferred cluster loses contact with the VPLEX Witness. The preferred cluster is connected to VPLEX Witness. • The preferred cluster continues I/O as it is still in contact with the VPLEX Witness. • The non-preferred cluster suspends I/O, as it is neither in contact with the other cluster, nor can it receive guidance from VPLEX Witness. Inter-Cluster Partition - Both clusters lose contact with each other, but still have access to VPLEX Witness. VPLEX Witness preserves the detach rule failure behaviors: • I/O continues on the preferred cluster. • If the preferred cluster can not proceed because it has not fully synchronized, the cluster suspends I/O. • Overriding the detach rule results in a zero RTO. Loss of Contact with VPLEX Witness - The clusters are still in contact with each other, but one or both of the clusters has lost contact with VPLEX Witness. • There is no change in I/O. • The cluster(s) that lost connectivity with VPLEX Witness sends a call-home notification. • If either cluster fails or if the inter-cluster link fails while the VPLEX Witness is down, VPLEX experiences data unavailability in all surviving clusters.

78 EMC VPLEX Product Guide Integrity and Resiliency

When the VPLEX Witness observes a failure and provides guidance, it sticks to this governance until both clusters report complete recovery. This is crucial in order to avoid split-brain and data corruption. As a result you may have a scenario where: ◆ Cluster-1 is isolated ◆ VPLEX Witness tells cluster-2 to continue I/O ◆ Cluster-2 becomes isolated. Because cluster-2 has previously received guidance to proceed from the VPLEX Witness, it proceeds even while it is isolated. In the meantime, if cluster-1 were to reconnect with the VPLEX Witness server, the VPLEX Witness server tells it to suspend. In this case, because of event timing, cluster-1 is connected to VPLEX Witness but it is suspended, while cluster-2 is isolated but it is proceeding.

VPLEX Witness in The value of the VPLEX Witness is different in VPLEX Geo configurations than it is in VPLEX Geo VPLEX Metro configurations. The value also varies depending on the release of configurations GeoSynchrony.

VPLEX Witness/ For systems running GeoSynchrony 5.0.1, clusters in Geo configurations do not GeoSynchrony 5.0.1 comply with VPLEX Witness guidance. Instead the clusters operate according to the detach rules applied to each asynchronous consistency group. Information displayed in the VPLEX Witness CLI context helps to determine the nature of a failure (cluster failure or inter-cluster link outage). Administrators can use this information to make manual fail-over decisions.

VPLEX Witness/ For VPLEX Geo systems running GeoSynchrony 5.1, VPLEX Witness automates the GeoSynchrony 5.1 response to failure scenarios that do not require data rollback. Manual intervention is required if data rollback is needed. Preferred cluster fails The value of VPLEX Witness when the preferred cluster fails varies depending on whether data rollback is required. ◆ No data rollback is required - VPLEX Witness guides the surviving cluster to allow I/O to all distributed volumes in all asynchronous consistency groups configured with the active-cluster-wins detach rule. I/O continues at the non-preferred cluster. If no rollback is required, VPLEX Witness automates failover with zero RTO and zero RPO. ◆ Rollback is required - If data rollback is required, all asynchronous consistency groups suspend I/O regardless of VPLEX Witness guidance, and the pre-configured rule set. Manual intervention is required to force the rollback in order to change the current data image to the last consistent image preserved on disk. The application may need to be manually restarted to ensure that it does not use data that is still in the application’s cache, but has been discarded by VPLEX. Inter-cluster link outage In the event of an inter-cluster link outage, VPLEX Witness guides the preferred cluster to continue I/O. Similar to failure of the preferred cluster, the value of VPLEX Witness varies depending on whether data rollback is required.

High Availability with VPLEX Witness 79 Integrity and Resiliency

◆ No data rollback is required - VPLEX Witness guides the preferred cluster to continue I/O to all distributed volumes in all asynchronous consistency groups configured with the active-cluster-wins detach rule. I/O is suspended at the non-preferred cluster. VPLEX Witness automates failover with zero RTO and zero RPO. ◆ Rollback is required - If data rollback is required, all asynchronous consistency groups suspend I/O regardless of VPLEX Witness guidance, and their pre-configured rule set. Manual intervention is required to force the rollback in order to change the current data image to the last consistent image preserved on disk, and to restart I/O to the suspended consistency groups. The application may need to be manually restarted to ensure that it does not use data that is still in the application’s cache, but has been discarded by VPLEX.

Higher availability Combine VPLEX Witness with VMware and cross cluster connection to create even higher availability. See Chapter 5, “VPLEX Use Cases” for more information on the use of VPLEX Witness withVPLEX Metro and VPLEX Geo configurations.

80 EMC VPLEX Product Guide Integrity and Resiliency

ALUA Asymmetric Logical Unit Access (ALUA) routes I/O of the LUN directed to non-active/failed storage processor to the active storage processor without changing the ownership of the LUN. Each LUN has two types of paths: ◆ Active/optimized paths are direct paths to the storage processor that “owns” the LUN. Active/optimized paths are usually the optimal path, and provide higher bandwidth than active/non-optimized paths. ◆ Active/non-optimized paths are indirect paths to the storage processor that does not “own” the LUN, via an interconnect bus. I/O that traverses active/non-optimized paths must be transferred to the storage processor that “owns” the LUN. This transfer increases latency and has an impact on the array. VPLEX detects the different path types and performs round robin load balancing across the active/optimized paths. VPLEX supports all three flavors of ALUA: ◆ Explicit ALUA - The storage processor changes the state of paths in response to commands (for example; the Set Target Port Groups command) from the host (the VPLEX backend). The storage processor must be “explicitly” instructed to change a path’s state. If the active/optimized path fails, VPLEX issues the instruction to transition the active/non-optimized path to active/optimized. There is no need to failover the LUN. ◆ Implicit ALUA - The storage processor can change the state of a path without any command from the host (the VPLEX back end). If the controller that owns the LUN fails, the array changes the state of the active/non-optimized path to active/optimized and fails over the LUN from the failed controller. On the next I/O after changing the path’s state, the storage processor returns a Unit Attention “Asymmetric Access State Changed” to the host (the VPLEX backend). VPLEX then re-discovers all the paths to get the updated access states. ◆ Implicit/explicit ALUA - either the host or the array can initiate the access state change. Storage processors support implicit only, explicit only, or both.

ALUA 81 Integrity and Resiliency

Additional resilience features This section describes additional VPLEX features that enable VPLEX to continue to service I/O during inter-cluster network outages or cluster failures.

Metadata volumes Meta-volumes store VPLEX metadata, including virtual-to-physical mappings, data about devices, virtual volumes, and system configuration settings. Metadata is stored in cache and backed up on specially designated external volumes called meta-volumes. After the meta-volume is configured, updates to the metadata are written to both the cache and the meta-volume when the VPLEX configuration is modified. Each VPLEX cluster maintains its own metadata, including: ◆ The local configuration for this cluster ◆ Distributed configuration information shared between clusters. At system startup, VPLEX reads the metadata, and loads the configuration information onto each director. When you make changes to the system configuration, VPLEX writes these changes to the metadata volume. If VPLEX loses access to the metadata volume, the VPLEX directors continue uninterrupted, using the in-memory copy of the configuration. VPLEX blocks changes to the system until access is restored or the automatic backup meta-volume is activated. Meta-volumes experience high I/O only during system startup and upgrade. I/O activity during normal operations is minimal.

Best practices For best resilience: ◆ Allocate storage volumes of 78GB for the metadata volume. ◆ Configure the metadata volume for each cluster with multiple back-end storage volumes provided by different storage arrays of the same type. ◆ Use the data protection capabilities provided by these storage arrays, such as RAID 1 to ensure the integrity of the system's metadata. ◆ Create backup copies of the metadata whenever configuration changes are made to the system. ◆ Perform regular backups of the metadata volumes on storage arrays that are separate from the arrays used for the metadata volume.

Logging volumes Logging volumes keep track of blocks written during: ◆ An inter-cluster link outage, or ◆ When one leg of a DR1 becomes unreachable and then recovers. After the inter-cluster link or leg is restored, the VPLEX system uses the information in logging volumes to synchronize the mirrors by sending only changed blocks across the link. Logging volumes also track changes during loss of a volume when that volume is one mirror in a distributed device.

82 EMC VPLEX Product Guide Integrity and Resiliency

CAUTION If no logging volume is accessible, then the entire leg is marked as out-of-date. A full re-synchronization is required once the leg is reattached.

The logging volumes on the continuing cluster experience high I/O during: ◆ Network outages or cluster failures ◆ Incremental synchronization When the network or cluster is restored, VPLEX reads the logging volume to determine what writes to synchronize to the reattached volume. There is no I/O activity during normal operations.

Best practices On a VPLEX Metro or VPLEX Geo configuration: ◆ Place logging volumes on the fastest storage available. ◆ Stripe logging volumes across several disks to accommodate the high level of I/O that occurs during and after outages. ◆ Configure at least 1 GB of logging volume space for every 16 TB of distributed device space. Slightly more space is required if the 16 TB of distributed storage is composed of multiple distributed devices because a small amount of non-logging information is also stored for each distributed device.

Global cache Memory systems of individual directors ensure durability of user and critical system data. The method used to protect user data depends on cache mode: ◆ Synchronous systems (write-through cache mode) leverage the back-end array by writing user data to the array. An acknowledgement for the written data must be received before the write is acknowledged back to the host. ◆ Asynchronous systems (write-back cache mode) ensure data durability by storing user data into the cache memory of the director that received the I/O, then placing a protection copy of this data on another director in the local cluster before acknowledging the write to the host. This ensures the data is protected in two independent memories. The data is later destaged to back-end storage arrays that provide the physical storage media.

Protection from power If a power failure lasts longer than 30 seconds in a VPLEX Geo configuration, then failure each VPLEX director copies its dirty cache data to the local solid state storage devices (SSDs). Known as vaulting, this process protects user data in cache if that data is at risk due to power loss. After each director vaults its dirty cache pages, VPLEX shuts down the director’s firmware. When operations resume, if any condition is not safe, the system does not resume normal status and calls home for diagnosis and repair. EMC Customer Support communicate with the VPLEX system and restore normal system operations.

Additional resilience features 83 Integrity and Resiliency

Under normal conditions, the SPS batteries can support two consecutive vaults. This ensures the system can resume I/O after the first power failure, and still vault a second time if there is another power failure.

84 EMC VPLEX Product Guide Integrity and Resiliency

Performance monitoring and security features VPLEX’s performance monitoring enables you to monitor the overall health of your system, identify bottlenecks, and resolve problems quickly. VPLEX security features help keep your system safe from intruders.

Performance VPLEX performance monitoring provides a customized view into the performance of monitoring your system. You decide which aspects of the system's performance to view and compare. VPLEX supports three general categories of performance monitoring: ◆ Current load monitoring allows administrators to watch CPU load during upgrades, I/O load across the inter-cluster WAN link, and front-end vs. back-end load during data mining or back up. Both the CLI and GUI support current load monitoring. ◆ Long term load monitoring collects data for capacity planning and load balancing. Both the CLI and GUI support long term load monitoring. ◆ Troubleshooting monitoring helps identify bottlenecks and high resource consumption. Troubleshooting monitoring is supported by monitors created using the CLI and/or perpetual monitors.

Performance The GUI’s performance monitoring dashboard is your customized view into the monitoring: GUI performance of your VPLEX system:

Figure 35 Performance monitoring - dashboard

Performance monitoring and security features 85 Integrity and Resiliency

You decide which aspects of the system's performance to view and compare:

Figure 36 Performance monitoring - select information to view

Performance information is displayed as a set of charts. For example, Figure 37 shows front-end throughput for a selected director:

Figure 37 Performance monitoring - sample chart

For additional information about the statistics available through the Performance Dashboard, see the EMC Unisphere for VPLEX online help available in the VPLEX GUI.

Performance The CLI collects and displays performance statistics using: monitoring: CLI monitors - Gather the specified statistic from the specified target at the specified interval. monitor sinks - Direct the output to the desired destination. Monitor sinks include the console, a file, or a combination of the two. Use the three pre-configured monitors for each director to collect information to diagnose common problems. Use the CLI to create a toolbox of custom monitors to operate under varying conditions including debugging, capacity planning, and workload characterization. For example: ◆ Create a performance monitor to collect statistics for CompareAndWrite (CAW) operations, miscompares, and latency for the specified virtual volume on director-1-1-B. ◆ Add a file sink to send output to the specified directory on the management server.

86 EMC VPLEX Product Guide Integrity and Resiliency

Note: SNMP statistics do not require a monitor or monitor sink. Use the snmp-agent configure command to configure and start the SNMP agent. For more information about monitoring with SNMP, refer to the EMC VPLEX Administration Guide.

Performance monitoring and security features 87 Integrity and Resiliency

Security features The VPLEX management server operating system (OS) is based on a Novell SUSE Linux Enterprise Server 10 distribution. The operating system has been configured to meet EMC security standards by disabling or removing unused services, and protecting access to network services through a firewall. Security features include: ◆ SSH Version 2 to access the management server shell ◆ 90-day password expiration ◆ HTTPS to access the VPLEX GUI ◆ IPsec VPN inter-cluster link in VPLEX Metro and VPLEX Geo configurations ◆ IPSEC VPN to connect each cluster of a VPLEX Metro or VPLEX Geo to the VPLEX Witness server ◆ SCP to copy files ◆ Tunneled VNC connection to access the management server desktop ◆ Separate networks for all VPLEX cluster communication ◆ Defined user accounts and roles ◆ Defined port usage for cluster communication ◆ Network encryption ◆ Certificate Authority (CA) certificate (default expiration 5 years) ◆ Two host certificates (default expiration 2 years) ◆ Third host certificate for optional VPLEX Witness

CAUTION The inter-cluster link carries unencrypted user data. To ensure privacy of the data, establish an encrypted VPN tunnel between the two sites.

88 EMC VPLEX Product Guide 5 VPLEX Use Cases

This section describes examples of VPLEX configurations. Topics include: ◆ Technology refresh ...... 90 ◆ Mobility...... 92 ◆ Collaboration ...... 95 ◆ VPLEX Metro HA...... 97 ◆ Redundancy with RecoverPoint ...... 105

VPLEX Use Cases 89 VPLEX Use Cases

Technology refresh In typical IT environments, migrations to new storage arrays (technology refreshes) require that data used by hosts be copied to a new volume on the new array. The host must then be reconfigured to access the new storage. This process requires downtime for the host. Migrations between heterogeneous arrays can be complicated and may require additional software or functionality. Integrating heterogeneous arrays in a single environment is difficult and requires a staff with a diverse skill set. Figure 38 shows the traditional view of storage arrays with servers attached at the redundant front end and storage (Array 1 and Array 2) connected to a redundant fabric at the back end.

Array 1

Server

No Federation

Server Array 2 VPLX-000381

Figure 38 Traditional view of storage arrays

When VPLEX is inserted between the front-end and back-end redundant fabrics, VPLEX appears to be the target to hosts and the initiator to storage. This abstract view of storage becomes very powerful when it comes time to replace the physical array that is providing storage to applications. With VPLEX, because the data resides on virtual volumes, it can be copied nondisruptively from one array to another without any downtime. The host does not need to be reconfigured; the physical data relocation is performed by VPLEX transparently and the virtual volumes retain the same identities and the same access points to the host. In Figure 39, the virtual disk is made up of the disks of Array A and Array B. The site administrator has determined that Array A has become obsolete and should be replaced with a new array. Array C is the new storage array. Using Mobility Central, the administrator: • adds Array C array into the VPLEX cluster. • assigns a target extent from the new array to each extent from the old array. • instructs VPLEX to perform the migration. VPLEX copies data from Array A to Array C while the host continues its access to the virtual volume without disruption.

90 EMC VPLEX Product Guide VPLEX Use Cases

After the copy of Array A to Array C is complete, Array A can be decommissioned:

Virtual Volume

Array A Array B Array C

VPLX-000380

Figure 39 VPLEX technology refresh

Because the virtual machine is addressing its data to the abstracted virtual volume, its data continues to flow to the virtual volume with no need to change the address of the data store. Although this example uses virtual machines, the same is true for traditional hosts. Using VPLEX, the administrator can move data used by an application to a different storage array without the application or server being aware of the change. This allows you to change the back-end storage arrays transparently - without interrupting I/O. VPLEX makes it easier to replace heterogeneous storage arrays on the back-end.

Technology refresh 91 VPLEX Use Cases

Mobility Use VPLEX to move data between data centers, relocate a data center or consolidate data, without disrupting host application access to the data. MOBILITY

Cluster A Cluster B

ACCESS ANYWHERE

Move and relocate VMs, applications, and data over distance

Figure 40 Moving data with VPLEX

The source and target arrays can be in the same data center (VPLEX Local) or in different data centers separated by up to 5ms (VPLEX Metro) or 50ms (VPLEX Geo). With VPLEX, source and target arrays can be heterogeneous. When you use VPLEX to move data, the data retains its original VPLEX volume identifier during and after the mobility operation. No change in volume identifiers eliminates application cut over. The application continues to use the “same” storage, unaware that it has moved. There are many types and reasons to move data: ◆ Move data from a “hot” storage device. ◆ Move applications from one storage device to another. ◆ Move operating system files from one storage device to another. ◆ Consolidate data or database instances. ◆ Move database instances. ◆ Move storage infrastructure from one physical location to another. With VPLEX, you no longer need to spend significant time and resources preparing to move data and applications. You don’t have to accept a forced outage and restart the application after the move is completed. Instead, a move can be made instantly between sites, over distance, and the data remains online and available during the move - no outage or downtime is required. Considerations before moving data include the business impact and the type of data to be moved, site locations, total amount of data, and schedules.

92 EMC VPLEX Product Guide VPLEX Use Cases

Mobility with the The VPLEX GUI helps you to easily move the physical location of virtual storage VPLEX migration while VPLEX provides continuous access to this storage by the host. wizard

Figure 41 Mobility Central GUI

To move storage: ◆ Display and select the extents (for extent mobility) or devices (for device mobility) to move. ◆ Display and select candidate storage volumes. ◆ VPLEX moves the data to its new location. Throughout the process, the volume retains its identity, and continuous access is maintained to the data from the host. There are three types of mobility jobs:

Table 5 Types of data mobility operations

Extent Moves data from one extent to another extent (within a cluster).

Device Moves data from one device to another device (within a cluster).

Batch Moves data using a migration plan file. Create batch migrations to automate routine tasks. • Use batched extent migrations to migrate arrays within the same cluster where the source and destination have the same number of LUNs and identical capacities. • Use batched device migrations to migrate to dissimilar arrays and to migrate devices between clusters in a VPLEX Metro or Geo.

Up to 25 local and 25 distributed migrations can be in progress at the same time.

Best practices All components of the system (virtual machine, software, volumes) must be available and in a running state. Data mobility can be used for disaster avoidance, planned upgrade, or physical movement of facilities.

How data mobility Mobility moves data from a source extent or device to a target extent or device. works When a mobility job is started, VPLEX creates a temporary RAID 1 device above each source device or extent to be migrated. The target extent or device becomes a mirror leg of the temporary device, and synchronization between the source and the target begins. The data mobility operation is non-disruptive. Applications using the data continue to write to the volumes during the mobility operation. New I/Os are written to both legs of the device. The following rules apply to mobility operations: ◆ The target extent/device must be the same size or larger than the source extent/device.

Mobility 93 VPLEX Use Cases

◆ The target device cannot be in use (no virtual volumes created on it). ◆ The target extent cannot be in use (no devices created on it). You can control the transfer speed of the data mobility operation. Higher transfer speeds complete the operation more quickly, but have a greater impact on host I/O. Slower transfer speeds have less impact on host I/O, but take longer to complete. You can change the transfer speed of a job while the job is in the queue or in progress. The change takes effect immediately. Starting in GeoSynchrony 5.0, the thinness of a thinly provisioned storage volume is retained through a mobility operation. Prior to 5.0, you must specify that rebuilds should be thin at the time you provision the thin volume. Refer to the EMC VPLEX CLI Guide or the online help for more information on thin provisioning of volumes.

94 EMC VPLEX Product Guide VPLEX Use Cases

Collaboration If you require distributed data collaboration, VPLEX can provide a significant advantage. Traditional collaboration across distance required files to be saved at one location and then sent to another site. This is slow, incurs bandwidth costs for large files or even small files that move regularly. Worse, it negatively impacts resource productivity as sites sit idle while they wait to receive the latest changes. Independent work quickly leads to version control problems as multiple people working at the same time are unaware of each other’s most recent changes. Merging independent work is time-consuming, costly, and grows more complicated as the dataset gets larger. Current applications for the sharing of information over distance are not suitable for collaboration in Big Data environments. Transfer of hundreds of GB or TB of data across WAN is inefficient, especially if you need to modify only a small portion of a huge data set, or use it for analysis. VPLEX enables multiple users at different sites to work on the same data, and maintain consistency in the dataset when changes are made. With VPLEX, the same data can be accessed by all users at all times, even if users are at different sites. The data is literally shared, not copied, so that a change made in one site shows up immediately at the other site. VPLEX doesn’t need to ship the entire file back and forth like other solutions. It only sends the changed updates as they are made, greatly reducing bandwidth costs and offering significant savings over other solutions. With VPLEX’s Federated AccessAnywhere, the data remains consistent, online, and always available.

ACCESS ANYWHERE

Enable concurrent read/write access to data across locations

Figure 42 Collaborate over distance with AccessAnywhere

Deploy VPLEX to enable real collaboration between teams located at different sites.

Collaboration using A simple method to support distributed data collaboration is to configure local local consistency consistency groups with global visibility. See “Local consistency group with global groups visibility” on page 62.

Collaboration 95 VPLEX Use Cases

Local consistency groups with global visibility allow the remote cluster to read and write to the local consistency group. The remote cluster has instant on-demand access to the consistency group. Local consistency groups with global visibility may contain only local volumes and the cache mode must be synchronous. Thus, latency between clusters for this configuration cannot exceed 5ms RTT.

Collaboration over For distributed data collaboration over greater distances, configure VPLEX Geo asynchronous asynchronous consistency groups with member volumes mirrored at both clusters. distances At asynchronous distances, latency between clusters can be up to 50ms RTT.

96 EMC VPLEX Product Guide VPLEX Use Cases

VPLEX Metro HA VPLEX Metro High Availability (HA) configurations consist of a VPLEX Metro system deployed in conjunction with VPLEX Witness. There are two types of Metro HA configurations: ◆ VPLEX Metro HA: can be deployed where the clusters are separated by 5 ms latency RTT or less. ◆ VPLEX Metro HA combined with Cross Connect between the VPLEX clusters and hosts: can be deployed where the clusters are separated by 1 ms latency RTT or less. This section describes the benefits of these two configurations.

AccessAnywhere enables both clusters to provide simultaneous coherent read/write access to the same virtual volume. Paths are up and the storage is available during normal operation and not only after failover.

Metro HA (without Combine VPLEX Metro HA with host failover clustering technologies such as cross-connect) VMware HA to create fully automatic application restart for any site-level disaster. VPLEX Metro/VMware HA configurations: ◆ Significantly reduce in Recovery Time Objective (RTO). In some cases, RTO can be eliminated. ◆ Ride through any single component failure (including the failure of an entire storage array) without disruption. ◆ When VMware Distributed Resource Scheduler (DRS) is enabled, distribute workload spikes between data centers, alleviating the need to purchase more storage. ◆ Eliminate the requirement to stretch the Fiber Channel fabric between sites. You can maintain fabric isolation between the two sites.

Figure 43 VPLEX Metro HA

VPLEX Metro HA 97 VPLEX Use Cases

Note: The VPLEX clusters in this configuration must be within 5 ms RTT latency.

In this deployment, virtual machines can write to the same distributed device from either cluster and move between two geographically disparate locations. If you use VMware Distributed Resource Scheduler (DRS) to automate load distribution on virtual machines across multiple ESX servers, you can move a virtual machine from an ESX server attached to one VPLEX cluster to an ESX server attached to the second VPLEX cluster - without losing access to the underlying storage.

Metro HA without This section describes the failure scenarios for VPLEX Metro HA without cross-connect failure cross-connect. management VPLEX cluster failure In the event of a full VPLEX cluster outage at one site: ◆ VPLEX Witness guides the surviving cluster to continue. ◆ VMware at the surviving cluster is unaffected. ◆ VMware restarts the virtual machines at the site where the outage occurred, redirecting I/O to the surviving cluster. VMware can restart because the second VPLEX cluster has continued I/O without interruption.

Figure 44 VPLEX Metro HA (no cross-connect) cluster failure

Inter-cluster link failure - non-preferred site If an inter-cluster link outage occurs, the preferred cluster continues, while the non-preferred cluster suspends.

Note: The preferred cluster is determined by consistency group detach rules. See “Detach rules” on page 63.

◆ If a virtual machine is located at the preferred cluster, there is no interruption of service. ◆ If a virtual machine is located at the non-preferred cluster, the storage associated with the virtual machine is suspended.

98 EMC VPLEX Product Guide VPLEX Use Cases

Most guest operating systems will fail. The virtual machine will be restarted at the preferred cluster after a short disruption.

Figure 45 VPLEX Metro HA (no cross-connect) inter-cluster link failure

If an inter-cluster link outage occurs: ◆ VPLEX Witness guides the preferred cluster to continue. ◆ VMware at the preferred cluster is unaffected. ◆ VMware restarts the virtual machines at the non-preferred (suspended) cluster, redirecting I/O to the preferred (uninterrupted) cluster. VMware can restart because the second VPLEX cluster has continued I/O without interruption.

Metro HA with VPLEX Metro HA with cross-connect (VPLEX’s front end ports are cross-connected) cross-connect can be deployed where the VPLEX clusters are separated by 1 ms latency RTT or less.

VPLEX Metro HA 99 VPLEX Use Cases

Figure 46 VPLEX Metro HA with cross-connect

VPLEX Metro HA combined with cross-connect eliminates RTO for most failure scenarios.

Metro HA with This section describes how VPLEX Metro HA with cross-connect rides through cross-connect failure failures of hosts, storage arrays, clusters, VPLEX Witness, and the inter-cluster link. management Host failure If hosts at one site fail, then VMware HA restarts the virtual machines on the surviving hosts. Since surviving hosts are connected to the same datastore, VMware can restart the virtual machines on any of the surviving hosts.

100 EMC VPLEX Product Guide VPLEX Use Cases

Figure 47 VPLEX Metro HA with cross-connect - host failure

Cluster failure If a VPLEX cluster fails: ◆ VPLEX Witness guides the surviving cluster to continue. ◆ VMware re-routes I/O to the surviving cluster. ◆ No disruption to I/O.

Figure 48 VPLEX Metro HA with cross-connect - cluster failure

Storage array failure If one or more storage arrays at one site fail: ◆ All distributed volumes continue I/O to the surviving leg.

VPLEX Metro HA 101 VPLEX Use Cases

◆ No disruption to the VPLEX clusters or the virtual machines. ◆ I/O is disrupted only to local virtual volumes on the VPLEX cluster attached to the failed array. :

Figure 49 VPLEX Metro HA with cross-connect - storage array failure

VPLEX Witness failure If VPLEX Witness fails or becomes unreachable (link outage): ◆ Both VPLEX clusters call-home to report that VPLEX Witness is not reachable. ◆ No disruption to I/O, the VPLEX clusters, or the virtual machines. :

Figure 50 VPLEX Metro HA with cross-connect - VPLEX Witness failure

WARNING Although this failure causes no disruption to the clusters or the virtual machines, it does make the configuration vulnerable to a second failure of a major component. If a cluster or inter-cluster link failure occurs while VPLEX Witness is not available,

102 EMC VPLEX Product Guide VPLEX Use Cases

distributed devices are suspended at both clusters. Therefore, if VPLEX Witness will be unavailable for an extended period, best practice is to disable it and allow the devices to use their configured detach rules.

Inter-cluster link failure If the inter-cluster link fails: ◆ VPLEX Witness guides the preferred cluster to continue. ◆ I/O suspends at the non-preferred cluster. ◆ VMware re-routes I/O to the continuing cluster. ◆ No disruption to I/O.

Figure 51 VPLEX Metro HA with cross-connect - inter-cluster link failure

Table 6 summarizes how VPLEX HA with cross-connect manages failures.

Table 6 How VPLEX Metro HA recovers from failure

Failure description Failure handling

Host failure (Site 1) VMware HA software automatically restarts the affected applications at Site 2.

VPLEX cluster failure VPLEX Witness detects the failure and enables all volumes on surviving cluster. (Site 1)

VPLEX Metro HA 103 VPLEX Use Cases

Table 6 How VPLEX Metro HA recovers from failure

Failure description Failure handling

Inter-cluster link failure • If the cross-connects use different physical links from those used to connect the VPLEX clusters, applications are unaffected. Every volume continues to be available in one data center or the other. • If the cross-connect links use the same physical links as those used to connect the VPLEX clusters, an application restart is required.

Storage array failure Applications are unaffected. VPLEX dynamically redirects I/O to mirrored copy on surviving array.

Note: This example assumes that all distributed volumes are also mirrored on the local cluster. If not, then the application remains available because the data can be fetched/sent from/to the remote cluster. However, each read/write operation now incurs a performance cost.

Failure of Both clusters call-home. VPLEX Witness As long as both clusters continue to operate and there is no inter-cluster link partition, applications are unaffected. CAUTION: If either cluster fails or if there is an inter-cluster link partition, the system is in jeopardy of data unavailability. If the VPLEX Witness outage is expected to be long, the VPLEX Witness functionality should be disabled to prevent the possible data unavailability.

104 EMC VPLEX Product Guide VPLEX Use Cases

Redundancy with RecoverPoint EMC RecoverPoint provides comprehensive data protection by continuous replication (splitting) of host writes. With RecoverPoint (RP), applications can be recovered to any point in time. Replicated writes can be written to local volumes to provide recovery from operational disasters, to remote volumes to provide recovery from site disasters, or both. RecoverPoint supports 3 types of splitters: ◆ Host OS-based splitters ◆ Intelligent fabric-based splitters (SANTap and Brocade) ◆ Storage-based splitters (CLARiiON CX4, VNX series, and Symmetrix VMAX) Starting in GeoSynchrony 5.1, VPLEX includes a RecoverPoint splitter. The splitter is built into VPLEX such that VPLEX volumes can have their I/O replicated by RecoverPoint Appliances (RPAs) to volumes located in VPLEX or on one or more heterogeneous storage arrays.

Note: For GeoSynchrony 5.1, RecoverPoint integration is offered for VPLEX Local and VPLEX Metro configurations (not for Geo).

Figure 52 RecoverPoint architecture

The VPLEX splitter enables VPLEX volumes in a VPLEX Local or VPLEX Metro to mirror I/O to a RecoverPoint Appliance (RPA) performing continuous data protection (CDP), continuous remote replication (CRR), or concurrent local and remote data protection (CLR).

RecoverPoint This section introduces basic terms and concepts that you need to understand terminology and RecoverPoint. concepts

RPA RecoverPoint Appliance. The hardware that manages all aspects of data protection. One RPA can manage multiple storage groups, each with differing policies. A minimum of two and a maximum of eight RPAs are installed at each site, located in the same facility as the host and storage. The set of RPAs installed at each site is referred to as an “RPA cluster”. If one RPA in a cluster fails, the functions provided by the failed RPA are automatically moved to one or more of the remaining RPAs.

Redundancy with RecoverPoint 105 VPLEX Use Cases

The RPAs at the production site transfer the split I/O to the replica site. The RPAs at the replica site distribute the data to the replica storage. In the event of failover, these roles can be reversed. The same RPA can serve as the production RPA for one consistency group and the replica RPA for another.

Volumes All RecoverPoint volumes can be hosted on VPLEX. In practice, some volumes may be hosted on VPLEX and other hosted on non-VPLEX storage. For example, the repository volume for an existing RPA cluster cannot be moved. If you are installing VPLEX into an existing RecoverPoint configuration, the repository volume is already configured on non-VPLEX storage.

Note: Starting in GeoSynchrony 5.2 and RecoverPoint 4.0, the RP supports 8K volumes. In prior release of RecoverPoint and GeoSynchrony, 4K volumes were supported.

The following types of volumes are required in all RecoverPoint configurations: ◆ Repository volume - A volume dedicated to RecoverPoint for each RPA cluster. The repository volume serves all RPAs of the particular RPA cluster and the splitter associated with that cluster. The repository volume stores configuration information about the RPAs and RecoverPoint consistency groups. There is one repository volume per RPA cluster. ◆ Production volumes - Volumes that are written to by the host applications. Writes to production volumes are split such that they are sent to both the normally designated volumes and RPAs simultaneously. Each production volume must be exactly the same size as the replica volume to which it replicates. ◆ Replica volumes - Volumes to which production volumes replicate. In prior releases, the replica volume must be exactly the same size as its production volume. In RecoverPoint (RP) 4.0 and GeoSynchrony release 5.2, RP supports a feature called Fake Size, where the replica volume size can be even higher than that of the production volume. ◆ Journal volumes - Volumes that contain data waiting to be distributed to target replica volumes and copies of the data previously distributed to the target volumes. Journal volumes allow convenient rollback to any point in time, enabling instantaneous recovery for application environments.

Snapshot/PIT A point-in-time copy that preserves the state of data at an instant in time, by storing only those blocks that are different from an already existing full copy of the data. Snapshots are also referred to as Point In Time (PIT). Snapshots stored at a replica journal represent the data that has changed on the production storage since the closing of the previous snapshot.

Image access User operation on a replica journal to enable read/write access to a selected PIT at a replica. There are four image access modes: ◆ Logged (physical) access - Used for production recovery, failover, testing, and cloning a replica. ◆ Direct access - This access mode can only be enabled after logged access, or virtual access with roll, are enabled. Used for extensive processing with a high write-rate, when image access is needed for a long period of time (and may not

106 EMC VPLEX Product Guide VPLEX Use Cases

have the journal space to support all of the data written to the image access log in this time), and when it is not required to save the history in the replica journal (the replica journal is lost after direct access). ◆ Virtual (instant) access - Used for single file recovery or light testing. Used to gain access to the replica data immediately, or when I/O performance is not important. ◆ Virtual (instant) access with roll - Used for production recovery, failover, or processing with a high write-rate. Used when the PIT is far from the current PIT (and would take too long to access in logged access mode).

IMPORTANT In the current release, virtual (instant) access and virtual (instant) access with roll are not supported by the VPLEX splitter.

Bookmark A label applied to a snapshot so that the snapshot can be explicitly called (identified) during recovery processes (during image access). Bookmarks are created through the CLI or GUI and can be created manually, by the user, or automatically, by the system. Bookmarks created automatically can be created at pre-defined intervals or in response to specific system events. Parallel bookmarks are bookmarks that are created simultaneously across multiple consistency groups.

RecoverPoint RecoverPoint supports three replication configurations: configurations ◆ Continuous data protection (CDP) ◆ Continuous remote replication (CRR) ◆ Concurrent local and remote data protection (CLR)

Figure 53 RecoverPoint configurations

Continuous data protection (CDP) In a CDP configuration, RecoverPoint continuously replicates data within the same site. Every write is kept in the journal volume, allowing recovery to any point in time. By default, snapshot granularity is set to per second, so the exact data size and contents are determined by the number of writes made by the host application per second. If necessary, the snapshot granularity can be set to per write. CDP configurations include:

Redundancy with RecoverPoint 107 VPLEX Use Cases

◆ Standard CDP - All components (splitters, storage, RPAs, and hosts) are located at the same site. ◆ Stretch CDP - The production host is located at the local site, splitters and storage are located at both the bunker site and the local site, and the RPAs are located at the bunker site. The repository volume and both the production and local journals are located at the bunker site. Continuous remote replication (CRR) In CRR configurations, data is transferred between a local and a remote site over Fibre Channel or a WAN. The RPAs, storage, and splitters are located at both the local and the remote site. By default, the replication mode is set to asynchronous, and the snapshot granularity is set to dynamic, so the exact data size and contents are determined by the policies set by the user and system performance. This provides application consistency for specific points in time. Synchronous replication is supported when the local and remote sites are connected using Fibre Channel. Concurrent local and remote data protection (CLR) A combination of both CDP and CRR. In a CLR configuration, RecoverPoint replicates data to both a local and a remote site simultaneously, providing concurrent local and remote data protection. The CDP copy is normally used for operational recovery, while the CRR copy is normally used for disaster recovery.

RecoverPoint/VPLEX configurations RecoverPoint can be configured on VPLEX Local or Metro systems as follows: ◆ “VPLEX Local and RecoverPoint CDP” ◆ “VPLEX Local and RecoverPoint CRR/CLR” ◆ “VPLEX Metro and RecoverPoint CDP at one site” ◆ “VPLEX Metro and RecoverPoint CRR/CLR” In VPLEX Local systems, RecoverPoint can replicate local volumes. In VPLEX Metro systems, RecoverPoint can replicate local volumes and distributed RAID 1 volumes. Virtual volumes can be replicated locally (CDP), remotely (CRR), or both (CLR). Distances between production sources and replication volumes vary based on the recovery objectives, inter-site bandwidth, latency, and other limitations outlined in the EMC Simple Support Matrix (ESSM) for RecoverPoint.

VPLEX Local and RecoverPoint CDP In VPLEX Local/RecoverPoint CDP configurations, I/O is split to replica volumes that are located at the same site. RPAs are deployed with the VPLEX cluster. This configuration supports unlimited points in time, with granularity up to a single write, for local VPLEX virtual volumes. The CDP replica volume can be a VPLEX virtual volume or any other heterogeneous storage supported by RecoverPoint.

108 EMC VPLEX Product Guide VPLEX Use Cases

Application event aware based rollback is supported for Microsoft SQL, Microsoft Exchange, and Oracle database applications. Users can quickly return to any point-in-time, in order to recover from operational disasters.

Figure 54 VPLEX Local and RecoverPoint CDP

VPLEX Local and RecoverPoint CRR/CLR In VPLEX Local/RecoverPoint CRR/CLR configurations, I/O is split to replica volumes located both at the site where the VPLEX cluster is located and a remote site. RPAs are deployed at both sites. If the primary site (the VPLEX cluster site) fails, customers can recover to any point in time at the remote site. Recovery can be automated through integration with MSCE and VMware SRM. This configuration can simulate a disaster at the primary site to test RecoverPoint disaster recovery features at the remote site. Application event aware based rollback is supported for Microsoft SQL, Microsoft Exchange, and Oracle database applications.

Redundancy with RecoverPoint 109 VPLEX Use Cases

The remote site can be an independent VPLEX cluster:

Figure 55 VPLEX Local and RecoverPoint CLR - remote site is independent VPLEX cluster

or, the remote site can be an array-based splitter:

Figure 56 VPLEX Local and RecoverPoint CLR - remote site is array-based splitter

VPLEX Metro and RecoverPoint CDP at one site In VPLEX Metro/RecoverPoint CDP configurations, I/O is split to replica volumes located at only one VPLEX cluster. RPAs are deployed at one VPLEX cluster:

Figure 57 VPLEX Metro and RecoverPoint CDP

110 EMC VPLEX Product Guide VPLEX Use Cases

VPLEX Metro/RecoverPoint CDP configurations support unlimited points in time on VPLEX distributed and local virtual volumes. Users can quickly return to any point-in-time, in order to recover from operational disasters.

VPLEX Metro and RecoverPoint CRR/CLR In VPLEX Metro/RecoverPoint CRR/CLR configurations, I/O is: ◆ Written to both VPLEX clusters (as part of normal VPLEX operations). ◆ Split on one VPLEX cluster to replica volumes located both at the cluster and at a remote site. RPAs are deployed at one VPLEX cluster and at a third site. The third site can be an independent VPLEX cluster:

Figure 58 VPLEX Metro and RecoverPoint CLR - remote site is independent VPLEX cluster

or, the remote site can be an array-based splitter:

Figure 59 VPLEX Metro and RecoverPoint CLR/CRR - remote site is array-based splitter

Although the RecoverPoint splitter is resident in all VPLEX clusters, only one cluster in a VPLEX Metro can have RPAs deployed. This configuration supports unlimited points in time, with granularity up to a single write, for local and distributed VPLEX virtual volumes. ◆ RecoverPoint Appliances can be deployed at only one VPLEX cluster in a Metro configuration.

Redundancy with RecoverPoint 111 VPLEX Use Cases

◆ All RecoverPoint-protected volumes must be on the preferred cluster, as designated by VPLEX consistency group-level detach rules. ◆ Customers can recover from operational disasters by quickly returning to any PIT on the VPLEX cluster where the RPAs are deployed or at the third site. ◆ Application event aware based rollback is supported on VPLEX Metro distributed/local virtual volumes for Microsoft SQL, Microsoft Exchange, and Oracle database applications. ◆ If the VPLEX cluster fails, then the customers can recover to any point in time at the remote (third) site. Recovery at remote site to any point in time can be automated through integration with MSCE and VMware Site Recovery Manager (SRM). See “vCenter Site Recovery Manager support for VPLEX” on page 115. ◆ This configuration can simulate a disaster at the VPLEX cluster to test RecoverPoint disaster recovery features at the remote site.

Shared VPLEX splitter The VPLEX splitter can be shared by multiple RecoverPoint clusters. This allows data to be replicated from a production VPLEX cluster to multiple RecoverPoint clusters.

Figure 60 Shared VPLEX splitter

Up to 4 RecoverPoint RPA clusters can share a VPLEX splitter.

Shared RecoverPoint RPA cluster The RecoverPoint RPA cluster can be shared by multiple VPLEX sites:

Figure 61 Shared RecoverPoint RPA cluster

112 EMC VPLEX Product Guide VPLEX Use Cases

RecoverPoint replication with CLARiiON VPLEX and RecoverPoint can be deployed in conjunction with CLARiiON-based RecoverPoint splitters, in both VPLEX Local and VPLEX Metro environments. In the configuration depicted below, a host writes to VPLEX Local. Virtual volumes are written to both legs of RAID 1 devices. The VPLEX splitter sends one copy to the usual back-end storage, and one copy across a WAN to a CLARiiON array at a remote disaster recovery site:

Site 1 DR Site

VPLEX

RAID 1 Mirrored Device

Volume Volume Volume

Journaling Journaling Volume Volume RecoverPointoverPoint ApplianceAAp ppliali RecoverPointcovovverPoie nt ApplianceAAp plianlianc RecoverPointWrite Appliance RecoverPoint WAN RecoverPoint Appliance RecoverPointGroup Appliance RecoverPoint Appliance RecoverPoint Appliance RecoverPoint Appliance

VPLX-000379

Figure 62 Replication with VPLEX Local and CLARiiON

Redundancy with RecoverPoint 113 VPLEX Use Cases

In the configuration depicted in Figure 63, a host writes to the VPLEX Metro. Distributed virtual volumes are split, sending one copy to each of the VPLEX clusters, and a third copy across a WAN to a CLARiiON array at a remote disaster recovery site:

Site 2 Site 1 DR Site

VPLEX VPLEX

WAN - COM

Distributed Device

Replica Volume Volume volume CLARiiON CLARiiON

Journaling Journaling Volume Volume RecoverPointcovveverPeerrPoiointnt ApplianceAppAApppplliaian RecoverPointoverPveeerrrPoinointnt ApplianceAAppAppliianan RecoverPointWrite Appliance RecoverPoint Appliance RecoverPointGroup Appliance RecoverPoint WAN RecoverPoint Appliance RecoverPoint Appliance RecoverPoint Appliance

VPLX-000378

Figure 63 Replication with VPLEX Metro and CLARiiON

Restoring VPLEX virtual volumes with RecoverPoint CRR Restoration of production volumes starts from the snapshot (PIT) selected by the user. From that point forward, the production source is restored from the replica. Data that is newer than the selected point in time will be rolled back and rewritten from the version in the replica. The replica’s journal is preserved and remains valid.

Data mobility with RecoverPoint VPLEX mobility between arrays does not impact RecoverPoint. Mobility between VPLEX-hosted arrays does not require any changes or full sweeps on the part of RecoverPoint.

114 EMC VPLEX Product Guide VPLEX Use Cases

vCenter Site Recovery Manager support for VPLEX With RecoverPoint replication, you can add Site Recovery Manager support to VPLEX.

Figure 64 Support for Site Recovery Manager

When an outage occurs in VPLEX Local or VPLEX Metro configurations, the virtual machine(s) can be restarted at the DR Site with automatic synchronization to the VPLEX configuration when the outage is over.

Redundancy with RecoverPoint 115 VPLEX Use Cases

116 EMC VPLEX Product Guide A

VS1 Hardware

This appendix describes VPLEX VS1 hardware, IP addressing, and internal cabling. Topics include: ◆ VS1 cluster configurations ...... 118 ◆ VS1 engine...... 121 ◆ VS1 IP addresses and component IDs ...... 122 ◆ VS1 Internal cabling...... 125

VS1 Hardware 117 VS1 Hardware

VS1 cluster configurations This section illustrates the components in VS1 clusters.

VS1 single-engine cluster

ON ON I I O O OFF OFF

ON ON I I O O OFF OFF

ON ON I I O O

OFF OFF

OFF OFF O

Management server O I I

ON ON

OFF OFF

O O

I I

ON ON

OFF OFF

O O

I I ON ON Director 1B Engine 1 Director 1A

SPS 1

VPLX-000215

Figure 65 VS1 single-engine cluster

118 EMC VPLEX Product Guide VS1 Hardware

VS1 dual-engine cluster

ON ON I I O O OFF OFF

ON ON I I O O OFF OFF

ON ON I I O O OFF OFF

Fibre Channel switch B UPS B Fibre Channel switch A

UPS A

OFF OFF O

Management server O I I

ON ON

Director 2B

Engine 2

OFF OFF

O O Director 2A

I I

ON ON

SPS 2

OFF OFF

O O

I I ON ON Director 1B Engine 1 Director 1A

SPS 1

VPLX-000214

Figure 66 VS1 dual-engine cluster

VS1 cluster configurations 119 VS1 Hardware

VS1 quad-engine cluster

Director 4B Engine 4 Director 4A

ON ON I I O O OFF OFF SPS 4

ON ON I I O O Director 3B Engine 3 OFF OFF Director 3A

ON ON I I O O SPS 3 OFF OFF

Fibre Channel switch B UPS B Fibre Channel switch A

UPS A

OFF OFF O

Management server O I I

ON ON

Director 2B

Engine 2

OFF OFF

O O Director 2A

I I

ON ON

SPS 2

OFF OFF

O O

I I ON ON Director 1B Engine 1 Director 1A

SPS 1

VPLX-000213

Figure 67 VS1 quad-engine cluster

120 EMC VPLEX Product Guide VS1 Hardware

VS1 engine

Director

o er suppl o er suppl Director Side mgmt module Side mgmt module

n n n n module carr ier mo dule ocal port ront-end S port port or i re annel ac -end S port port or n not currentl used V-

Figure 68 VS1 engine

Note: The WAN COM ports on IOMs A4 and B4 are used if the inter cluster connections are over Fibre Channel, and the WAN COM ports on IOMs A5 and B5 are used if the inter cluster connections are over IP.

IOMs A2 and B2 in the figure each contain four Fibre Channel ports for VPLEX Metro or VPLEX Geo WAN connections. VPLEX Metro and VPLEX Geo also support IP WAN connections, in which case these IOMs each contain two Ethernet ports. In a VPLEX Local configuration, these IOMs contain no ports.

VS1 engine 121 VS1 Hardware

VS1 IP addresses and component IDs The IP addresses of the VPLEX hardware components are determined by: ◆ The internal management network (A or B), ◆ The Cluster IP Seed, and ◆ The Enclosure ID (which matches the engine number). Figure 69 shows the IP addresses in a cluster with a Cluster IP Seed of 1. Figure 70 shows the addresses for a Cluster IP Seed of 2. Cluster IP Seed is the same as the Cluster ID: ◆ For VPLEX Local - Cluster ID is always 1. ◆ For VPLEX Metro and Geo - Cluster ID for the first cluster that is set up is 1, and the second cluster is 2.

122 EMC VPLEX Product Guide VS1 Hardware

VS1 component IP addresses (cluster-1)

Cluster IP Seed = 1 Enclosure IDs = engine numbers Management network B addresses Management network A addresses

Engine 4: Engine 4: Director 4B 128.221.253.42 Director 4B 128.221.252.42 Director 4A 128.221.253.41 Director 4A 128.221.252.41

Engine 3: Engine 3: Director 3B 128.221.253.40 Director 3B 128.221.252.40 Director 3A 128.221.253.39 Director 3A 128.221.252.39

FC switch B 128.221.253.34 Service port Public Ethernet port 128.221.252.2 Customer-assigned FC switch A 128.221.252.34

Mgt A port Mgt B port Management server 128.221.253.33 128.221.252.33

Engine 2: Engine 2: Director 2B 128.221.253.38 Director 2B 128.221.252.38 Director 2A 128.221.253.37 Director 2A 128.221.252.37

Engine 1: Engine 1: Director 1B 128.221.253.36 Director 1B 128.221.252.36 Director 1A 128.221.253.35 Director 1A 128.221.252.35

VPLX-000107

Figure 69 IP addresses in cluster-1

VS1 IP addresses and component IDs 123 VS1 Hardware

VS1 component IP addresses (cluster-2)

Cluster IP Seed = 2 Enclosure IDs = engine numbers Management network B addresses Management network A addresses

Engine 4: Engine 4: Director 4B 128.221.253.74 Director 4B 128.221.252.74 Director 4A 128.221.253.73 Director 4A 128.221.252.73

Engine 3: Engine 3: Director 3B 128.221.253.72 Director 3B 128.221.252.72 Director 3A 128.221.253.71 Director 3A 128.221.252.71

FC switch B 128.221.253.66 Service port Public Ethernet port 128.221.252.2 Customer-assigned FC switch A 128.221.252.66

Mgt A port Mgt B port Management server 128.221.253.65 128.221.252.65

Engine 2: Engine 2: Director 2B 128.221.253.70 Director 2B 128.221.252.70 Director 2A 128.221.253.69 Director 2A 128.221.252.69

Engine 1: Engine 1: Director 1B 128.221.253.68 Director 1B 128.221.252.68 Director 1A 128.221.253.67 Director 1A 128.221.252.67

VPLX-000108

Figure 70 IP addresses in cluster-2 (VPLEX Metro or Geo)

124 EMC VPLEX Product Guide VS1 Hardware

VS1 Internal cabling The figures in this section show the various cabling inside a VPLEX cabinet. All the cables shown in this section except inter-cluster WAN COM cables (VPLEX Metro and VPLEX Geo only) are installed before the unit ships from EMC. This section includes the following figures:

Cluster size Cable type Figure

Quad-engine Ethernet Figure 71 on page 126

Serial Figure 72 on page 127

Fibre Channel Figure 73 on page 128

AC power Figure 74 on page 129

Dual-engine Ethernet Figure 75 on page 130

Serial Figure 76 on page 131

Fibre Channel Figure 77 on page 132

AC power Figure 78 on page 133

Single-engine Ethernet Figure 79 on page 134

Serial Figure 80 on page 134

Fibre Channel Figure 81 on page 134

AC power Figure 82 on page 135

All cluster sizes, Fibre Channel WAN COM Figure 83 on page 135 VPLEX Metro or VPLEX Geo only IP WAN COM Figure 84 on page 136

VS1 Internal cabling 125 VS1 Hardware

VS1 cabling - quad-engine cluster

Ethernet cabling

Engine 4 Green, 37 in.

Engine 3 Green, 64 in. Green, 64 in.

Purple, 71 in. Fibre Channel switch B

Fibre Channel switch A Purple, 71 in.

Management server

Engine 2 Green, 37 in. Green, 48 in. Purple, 71 in. Purple, 37 in. Purple, 37 in.

Engine 1

VPLX-000244

Figure 71 Ethernet cabling - VS1 quad-engine cluster

126 EMC VPLEX Product Guide VS1 Hardware

Serial cabling

Engine 4

12 in. 12 in.

Engine 3

12 in. 12 in.

UPS B

UPS A

40 in. 40 in.

Engine 2

12 in. 12 in.

Engine1

12 in. 12 in.

VPLX-000065

Figure 72 Serial cabling - VS1 quad-engine cluster

VS1 Internal cabling 127 VS1 Hardware

Fibre Channel cabling

Engine 4

12 in. 12 in.

Engine 3

12 in. 12 in.

UPS B

UPS A

40 in. 40 in.

Engine 2

12 in. 12 in.

Engine1

12 in. 12 in.

VPLX-000065

Figure 73 Serial cabling - VS1 quad-engine cluster

128 EMC VPLEX Product Guide VS1 Hardware

AC power cabling

Engine 4 ON ON I I O O OFF OFF

SPS 4

ON ON I I O O OFF OFF

Engine 3

ON ON I I O O SPS 3 OFF OFF

Fibre Channel switch B

UPS B

Fibre Channel switch A

UPS A

OFF

OFF Management server

O O

I I

ON ON

OFF OFF

O O

I I ON

ON Engine 2

SPS 2

OFF OFF

O O

I I

ON ON

Engine 1

SPS 1

VPLX-000211

Figure 74 AC power cabling - VS1 quad-engine cluster

VS1 Internal cabling 129 VS1 Hardware

VS1 cabling - dual-engine cluster

Ethernet cabling

Fibre Channel switch B Green, 48 in.

Fibre Channel switch A

Management server Purple, 71 in. Green, 37 in.

Engine 2 Purple, 71 in. Green, 20 in.

Engine 1 Purple, 20 in.

VPLX-000043

Figure 75 Ethernet cabling - VS1 dual-engine cluster

130 EMC VPLEX Product Guide VS1 Hardware

Serial cabling

UPS B

UPS A

40 in.

40 in.

Engine 2

12 in. 12 in.

Engine1

12 in. 12 in.

VPLX-000067

Figure 76 Serial cabling - VS1 dual-engine cluster

VS1 Internal cabling 131 VS1 Hardware

Fibre-channel cabling

79 in. (all 16 cables) Eight cables for a quad-engine configuration are included for ease of upgrading, and are tied to the cabinet sidewalls

Fibre Channel switch B

Fibre Channel switch A

Engine 2

Engine 1

VPLX-000542

Figure 77 Fibre Channel cabling- VS1 dual-engine cluster

Note: In the drawing, blue lines represent the Side-A connections and orange lines represent the Side-B connections. All 16 Fibre Channel cables are light blue. However, the Side-A cables have blue labels and the Side-B cables have orange labels.

132 EMC VPLEX Product Guide VS1 Hardware

AC power cabling

I

Fibre Channel switch B

UPS B

Fibre Channel switch A

UPS A

Management server

I I

Engine 2 O O

I I

SPS 2

O O

I I

Engine 1

SPS 1

VPLX-000042

Figure 78 AC power cabling - VS1 dual-engine cluster

VS1 Internal cabling 133 VS1 Hardware

VS1 cabling - single-engine cluster

Management server Green, 37 in. Purple, 71 in.

Engine 1

VPLX-000052

Figure 79 Ethernet cabling - VS1 single-engine cluster

Engine 1

12 in. 12 in.

VPLX-000069

Figure 80 Serial cabling - VS1 single-engine cluster

Engine 1 39 in. (2 cables)

VPLX-000543

Figure 81 Fibre Channel cabling - VS1 single-engine cluster

Note: In the drawing, blue lines represent the Side-A connections and orange lines represent the Side-B connections. Both Fibre Channel cables are light blue. However, the Side-A cable has blue labels and the Side-B cable has orange labels.

134 EMC VPLEX Product Guide

VS1 Hardware

OFF

OFF Management server

O O

I I

ON ON

OFF OFF

O O

I I

ON ON

OFF OFF

O O

I I

ON ON

Engine 1

SPS

VPLX-000041

Figure 82 AC power cabling - VS1 single-engine cluster

WAN COM cabling - VPLEX Metro and Geo Figure 83 and Figure 84 show the WAN COM connection options for VS1 hardware.

Cluster 1 (same connections from each engine in cluster) Cluster 2 (same connections from each engine in cluster)

B4-FC02 B4-FC03 A4-FC03 B4-FC02 B4-FC03 A4-FC03 A4-FC02 A4-FC02 Intercluster Intercluster ISL 1 COM SAN COM SAN NOTE: “ISL” is switch 1A switch 2A inter-switch link

Intercluster Intercluster ISL 2 COM SAN COM SAN switch 1B switch 2B

VPLX-000317

Figure 83 Fibre Channel WAN COM connections - VS1

VS1 Internal cabling 135 VS1 Hardware

Cluster 1 Cluster 2 (same connections from each engine in cluster) (same connections from each engine in cluster)

B5-GE00 B5-GE01 A5-GE01 IP B5-GE00 B5-GE01 A5-GE01 A5-GE00 subnet A A5-GE00

IP subnet B

VPLX-000368

Figure 84 IP WAN COM connections - VS1

136 EMC VPLEX Product Guide Glossary

This glossary contains terms related to VPLEX federated storage systems. Many of these terms are used in these manual.

A

AccessAnywhere The breakthrough technology that enables VPLEX clusters to provide access to information between clusters that are separated by distance.

active/active A cluster with no primary or standby servers, because all servers can run applications and interchangeably act as backup for one another.

Active Directory A directory service included in most Windows Server operating systems. AD authenticates and authorizes users and computers in a network of Windows domain type.

active mirror A copy of data that is part of a local or remote mirroring service.

active/passive A powered component that is ready to operate upon the failure of a primary component.

array A collection of disk drives where user data and parity data may be stored. Devices can consist of some or all of the drives within an array.

asynchronous Describes objects or events that are not coordinated in time. A process operates independently of other processes, being initiated and left for another task before being acknowledged.

For example, a host writes data to the blades and then begins other work while the data is transferred to a local disk and across the WAN asynchronously. See also “synchronous.”

B

bandwidth The range of transmission frequencies a network can accommodate, expressed as the difference between the highest and lowest frequencies of a transmission cycle. High bandwidth allows fast or high-volume transmissions.

backend port VPLEX director port connected to storage arrays (acts as an initiator).

EMC® VPLEX™ Administration Guide 137 Glossary

bias When a cluster has the bias for a given DR1, it continues to service I/O to volumes on that cluster if connectivity to the remote cluster is lost (due to cluster partition or cluster failure). The bias for a specific volume is determined by the detach rules for the volume, the detach rules for the consistency group (if the volume is a member of a consistency group) and VPLEX Witness (if VPLEX Witness is deployed).

bit A unit of information that has a binary digit value of either 0 or 1.

block The smallest amount of data that can be transferred following SCSI standards, which is traditionally 512 bytes. Virtual volumes are presented to users as a contiguous lists of blocks.

block size The actual size of a block on a device.

byte Memory space used to store eight bits of data.

C

cache Temporary storage for recent writes and recently accessed data. Disk data is read through the cache so that subsequent read references are found in the cache.

cache coherency Managing the cache so data is not lost, corrupted, or overwritten. With multiple processors, data blocks may have several copies, one in the main memory and one in each of the cache memories. Cache coherency propagates the blocks of multiple users throughout the system in a timely fashion, ensuring the data blocks do not have inconsistent versions in the different processors caches.

cluster Two or more VPLEX directors forming a single fault-tolerant cluster, deployed as one to four engines.

cluster ID The identifier for each cluster in a multi-cluster deployment. The ID is assigned during installation.

cluster deployment ID A numerical cluster identifier, unique within a VPLEX cluster. By default, VPLEX clusters have a cluster deployment ID of 1. For multi-cluster deployments, all but one cluster must be reconfigured to have different cluster deployment IDs.

cluster IP seed The VPLEX IP seed is used to generate the IP addresses used by the internal components of the VPLEX. For more information about components and their IP addresses, refer to EMC VPLEX Installation and Setup Guide. Cluster ID is used by the virtualization software (inter director messaging, cluster identification).

clustering Using two or more computers to function together as a single entity. Benefits include fault tolerance and load balancing, which increases reliability and up time.

COM The intra-cluster communication (Fibre Channel). The communication used for cache coherency and replication traffic.

command line Method of operating system or application software by typing commands to perform interface (CLI) specific tasks.

consistency group A VPLEX structure that groups together virtual volumes and applies the same detach and failover rules to all member volumes. Consistency groups ensure the common application of a set of properties to the entire group. Create consistency groups for sets of volumes that require the same I/O behavior in the event of a link failure. There are two types of consistency groups:

138 EMC® VPLEX™ Administration Guide Glossary

◆ Synchronous Consistency Groups - Use write-through (synchronous) cache mode to write data to the underlying storage before an acknowledgement is sent to the host. Dependent on latency between clusters and the application's tolerance of the latency ◆ Asynchronous Consistency Groups - Use write-back (asynchronous) cache mode to write protect data by mirroring it to the memory another director in the cluster. Data is destaged asynchronously to the back-end storage arrays. Writes are acknowledged once the data has been committed to disk in write order

continuity of The goal of establishing policies and procedures to be used during an emergency, operations (COOP) including the ability to process, store, and transmit data before and after.

controller A device that controls the transfer of data to and from a computer and a peripheral device.

D

data sharing The ability to share access to the same data with multiple servers regardless of time and location.

detach rule Predefined rules that determine which cluster continues I/O when connectivity between clusters is lost. A cluster loses connectivity to its peer cluster due to cluster partition or cluster failure.

Detach rules are applied at two levels; to individual volumes, and to consistency groups. If a volume is a member of a consistency group, the group detach rule overrides the rule set for the individual volumes. Note that all detach rules may be overridden by VPLEX Witness, if VPLEX Witness is deployed.

device A combination of one or more extents to which you add specific RAID properties. Local devices use storage from only one cluster. In VPLEX Metro and Geo configurations, distributed devices use storage from both clusters.

director A CPU module that runs GeoSynchrony, the core VPLEX software. There are two directors (A and B) in each engine, and each has dedicated resources and is capable of functioning independently.

dirty data The write-specific data stored in the cache memory that has yet to be written to disk. disaster recovery (DR) The ability to restart system operations after an error, preventing data loss.

discovered array An array that is connected to the SAN and discovered by VPLEX.

disk cache A section of RAM that provides cache between the disk and the CPU. RAMs access time is significantly faster than disk access time; therefore, a disk-caching program enables the computer to operate faster by placing recently accessed data in the disk cache.

distributed device A RAID 1 device whose mirrors are in different VPLEX clusters. distributed file system Supports the sharing of files and resources in the form of persistent storage over a (DFS) network.

distributed RAID1 Distributed devices have physical volumes at both clusters in a VPLEX Metro or device (DR1) VPLEX Geo configuration for simultaneous active/active read/write access using AccessAnywhere™

EMC® VPLEX™ Administration Guide 139 Glossary

E

engine Consists of two directors, management modules, and redundant power. Unit of scale for VPLEX configurations. Single = 1 engine, dual = 2 engines, Quad = 4 engines per cluster.

Ethernet A Local Area Network (LAN) protocol. Ethernet uses a bus topology, meaning all devices are connected to a central cable, and supports data transfer rates of between 10 megabits per second and 10 gigabits per second. For example, 100 Base-T supports data transfer rates of 100 Mb/s.

event A log message that results from a significant action initiated by a user or the system.

extent All or a portion (range of blocks) of a storage volume.

F

failover Automatically switching to a redundant or standby device, system, or data path upon the failure or abnormal termination of the currently active device, system, or data path.

fault domain A set of components that share a single point of failure. For VPLEX, the concept that every component of a Highly Available system is separated, so that if a fault occurs in one domain, it will not result in failure in other domains to which it is connected.

fault tolerance Ability of a system to keep working in the event of hardware or software failure, usually achieved by duplicating key system components.

Fibre Channel (FC) A protocol for transmitting data between computer devices. Longer distance requires the use of optical fiber; however, FC also works using coaxial cable and ordinary telephone twisted pair media. Fibre channel offers point-to-point, switched, and loop interfaces. Used within a SAN to carry SCSI traffic.

Fibre Channel over IP Combines Fibre Channel and Internet protocol features to connect SANs in (FCIP) geographically distributed systems.

field replaceable unit A unit or component of a system that can be replaced on site as opposed to returning (FRU) the system to the manufacturer for repair.

firmware Software that is loaded on and runs from the flash ROM on the VPLEX directors.

front end port VPLEX director port connected to host initiators (acts as a target).

G

geographically A system physically distributed across two or more geographically separated sites. distributed system The degree of distribution can vary widely, from different locations on a campus or in a city to different continents.

gigabit (Gb or Gbit) 1,073,741,824 (2^30) bits. Often rounded to 10^9.

gigabit Ethernet The version of Ethernet that supports data transfer rates of 1 Gigabit per second.

gigabyte (GB) 1,073,741,824 (2^30) bytes. Often rounded to 10^9.

140 EMC® VPLEX™ Administration Guide Glossary

global file system A shared-storage cluster or distributed file system. (GFS) H

hold provisioning An attribute of a registered array that allows you to set the array as unavailable for further provisioning of new storage.

host bus adapter An I/O adapter that manages the transfer of information between the host computers (HBA) bus and memory system. The adapter performs many low-level interface functions automatically or with minimal processor involvement to minimize the impact on the host processors performance.

I

input/output (I/O) Any operation, program, or device that transfers data to or from a computer. internet Fibre Channel Connects Fibre Channel storage devices to SANs or the Internet in geographically protocol (iFCP) distributed systems using TCP.

intranet A network operating like the World Wide Web but with access restricted to a limited group of authorized users.

internet small A protocol that allows commands to travel through IP networks, which carries data computer system from storage units to servers anywhere in a computer network. interface (iSCSI)

I/O (input/output) The transfer of data to or from a computer.

K

kilobit (Kb) 1,024 (2^10) bits. Often rounded to 10^3.

kilobyte (K or KB) 1,024 (2^10) bytes. Often rounded to 10^3.

L

latency Amount of time it requires to fulfill an I/O request.

LDAP Lightweight Directory Access Protocol, an application protocol that accesses and maintains distributed directory information services over an IP network.

load balancing Distributing the processing and communications activity evenly across a system or network so no single device is overwhelmed. Load balancing is especially important when the number of I/O requests issued is unpredictable.

local area network A group of computers and associated devices that share a common communications (LAN) line and typically share the resources of a single processor or server within a small geographic area.

local device A combination of one or more extents to which you add specific RAID properties. Local devices use storage from only one cluster.

logical unit number Virtual storage to which a given server with a physical connection to the underlying (LUN) storage device may be granted or denied access. LUNs are used to identify SCSI devices, such as external hard drives, connected to a computer. Each device is assigned a LUN number which serves as the device's unique address.

EMC® VPLEX™ Administration Guide 141 Glossary

M

megabit (Mb) 1,048,576 (2^20) bits. Often rounded to 10^6.

megabyte (MB) 1,048,576 (2^20) bytes. Often rounded to 10^6.

metadata Information about data, such as data quality, content, and condition.

metavolume A storage volume used by the system that contains the metadata for all the virtual volumes managed by the system. There is one metadata storage volume per cluster.

Metro-Plex Two VPLEX Metro clusters connected within metro (synchronous) distances, approximately 60 miles or 100 kilometers.

mirroring The writing of data to two or more disks simultaneously. If one of the disk drives fails, the system can instantly switch to one of the other disks without losing data or service. RAID 1 provides mirroring.

mirroring services Mirroring features provided through a storage service profile.

miss An operation where the cache is searched but does not contain the data, so the data instead must be accessed from disk.

move data across This is a new menu option under "Mobility Central". profiles N

namespace A set of names recognized by a file system in which all names are unique.

network System of computers, terminals, and databases connected by communication lines.

network architecture Design of a network, including hardware, software, method of connection, and the protocol used.

network-attached Storage elements connected directly to a network. storage (NAS)

network partition When one site loses contact or communication with another site.

O

Open LDAP Open source implementation of the Lightweight Directory Access Protocol (LDAP).

P

parity checking Checking for errors in binary data. Depending on whether the byte has an even or odd number of bits, an extra 0 or 1 bit, called a parity bit, is added to each byte in a transmission. The sender and receiver agree on odd parity, even parity, or no parity. If they agree on even parity, a parity bit is added that makes each byte even. If they agree on odd parity, a parity bit is added that makes each byte odd. If the data is transmitted incorrectly, the change in parity will reveal the error.

partition A subdivision of a physical or virtual disk, which is a logical entity only visible to the end user, not any of the devices.

142 EMC® VPLEX™ Administration Guide Glossary

R

RAID The use of two or more storage volumes to provide better performance, error recovery, and fault tolerance.

RAID 0 A performance-orientated striped or dispersed data mapping technique. Uniformly sized blocks of storage are assigned in regular sequence to all of the arrays disks. Provides high I/O performance at low inherent cost. No additional disks are required. The advantages of RAID 0 are a very simple design and an ease of implementation.

RAID 1 Also called mirroring, this has been used longer than any other form of RAID. It remains popular because of simplicity and a high level of data availability. A mirrored array consists of two or more disks. Each disk in a mirrored array holds an identical image of the user data. RAID 1 has no striping. Read performance is improved since either disk can be read at the same time. Write performance is lower than single disk storage. Writes must be performed on all disks, or mirrors, in the RAID 1. RAID 1 provides very good data reliability for read-intensive applications.

RAID leg A copy of data, called a mirror, that is located at a user's current location.

rebuild The process of reconstructing data onto a spare or replacement drive after a drive failure. Data is reconstructed from the data on the surviving disks, assuming mirroring has been employed.

RecoverPoint Hardware that manages all aspects of data protection for a storage group, including Appliance (RPA) capturing changes, maintaining the images in the journal volumes, and performing image recovery.

RecoverPoint cluster All connected RecoverPoint Appliances on both sides of the replication.

RecoverPoint site All RecoverPoint entities on one side of the replication.

Recovery Point See “RPO” on page 144. Objective

Recovery Time See “RTO” on page 144. Objective

redundancy The duplication of hardware and software components. In a redundant system, if a component fails then a redundant component takes over, allowing operations to continue without interruption.

registered array An array that is registered with VPLEX. Registration is required to make the array available for services-based provisioning. Registration includes connecting to and creating awareness of the array’s intelligent features. Only VMAX and VNX arrays can be registered.

reliability The ability of a system to recover lost data.

remote direct Allows computers within a network to exchange data using their main memories and memory access without using the processor, cache, or operating system of either computer. (RDMA)

Replication set When RecoverPoint is deployed, a production source volume and one or more replica volume(s) to which it replicates.

EMC® VPLEX™ Administration Guide 143 Glossary

restore source This operation restores the source consistency group from data on the copy target.

RPO Recovery Point Objective. The time interval between the point of failure of a storage system and the expected point in the past, to which the storage system is capable of recovering customer data. Informally, RPO is a maximum amount of data loss that can be tolerated by the application after a failure. The value of the RPO is highly dependent upon the recovery technique used. For example, RPO for backups is typically days; for asynchronous replication minutes; and for mirroring or synchronous replication seconds or instantaneous.

RTO Recovery Time Objective. Not to be confused with RPO, RTO is the time duration within which a storage solution is expected to recover from failure and begin servicing application requests. Informally, RTO is the longest tolerable application outage due to a failure of a storage system. RTO is a function of the storage technology. It may measure in hours for backup systems, minutes for a remote replication, and seconds (or less) for a mirroring.

S

scalability Ability to easily change a system in size or configuration to suit changing conditions, to grow with your needs.

services-based provisioning

simple network Monitors systems and devices in a network. management protocol (SNMP)

site ID The identifier for each cluster in a multi-cluster plex. By default, in a non-geographically distributed system the ID is 0. In a geographically distributed system, one clusters ID is 1, the next is 2, and so on, each number identifying a physically separate cluster. These identifiers are assigned during installation.

SLES SUSE Linux Enterprise Server is a Linux distribution supplied by SUSE and targeted at the business market.

small computer A set of evolving ANSI standard electronic interfaces that allow personal computers system interface to communicate faster and more flexibly than previous interfaces with peripheral (SCSI) hardware such as disk drives, tape drives, CD-ROM drives, printers, and scanners.

splitter EMC RecoverPoint write-splitting technology built into GeoSynchrony starting in 5.1.

A high-speed special purpose network or subnetwork that interconnects different (SAN) kinds of data storage devices with associated data servers on behalf of a larger network of users.

storage view A combination of registered initiators (hosts), front-end ports, and virtual volumes, used to control a hosts access to storage.

storage volume A Logical Unit Number (LUN) or unit of storage presented by the back end array.

stripe depth The number of blocks of data stored contiguously on each storage volume in a RAID 0 device.

144 EMC® VPLEX™ Administration Guide Glossary

striping A technique for spreading data over multiple disk drives. Disk striping can speed up operations that retrieve data from disk storage. Data is divided into units and distributed across the available disks. RAID 0 provides disk striping.

synchronous Describes objects or events that are coordinated in time. A process is initiated and must be completed before another task is allowed to begin.

For example, in banking two withdrawals from a checking account that are started at the same time must not overlap; therefore, they are processed synchronously. See also “asynchronous.”

T

throughput 1. The number of bits, characters, or blocks passing through a data communication system or portion of that system.

2. The maximum capacity of a communications channel or system.

3. A measure of the amount of work performed by a system over a period of time. For example, the number of I/Os per day.

tool command A scripting language often used for rapid prototypes and scripted applications. language (TCL)

transfer size The size of the region in cache used to service data migration. The area is globally locked, read at the source, and written at the target. Transfer-size can be as small as 40 K, as large as 128 M, and must be a multiple of 4 K. The default value is 128 K.

A larger transfer-size results in higher performance for the migration, but may negatively impact front-end I/O. This is especially true for VPLEX Metro migrations. Set a large transfer-size for migrations when the priority is data protection or migration performance.

A smaller transfer-size results in lower performance for the migration, but creates less impact on front-end I/O and response times for hosts. Set a smaller transfer-size for migrations when the priority is front-end storage response time.

transmission control The basic communication language or protocol used for traffic on a private network protocol/Internet and the Internet. protocol (TCP/IP) U uninterruptible power A power supply that includes a battery to maintain power in the event of a power supply (UPS) failure.

universal unique A 64-bit number used to uniquely identify each VPLEX director. This number is identifier (UUID) based on the hardware serial number assigned to each director.

V

virtualization A layer of abstraction implemented in software that servers use to divide available physical storage into storage volumes or virtual volumes.

virtual volume Unit of storage presented by the VPLEX front end ports to hosts. A virtual volume looks like a contiguous volume, but can be distributed over two or more storage volumes.

EMC® VPLEX™ Administration Guide 145 Glossary

W

wide area network A geographically dispersed telecommunications network. This term distinguishes a (WAN) broader telecommunication structure from a local area network (LAN).

world wide name A specific Fibre Channel Name Identifier that is unique worldwide and represented (WWN) by a 64-bit unsigned binary value.

write-back mode A caching technique where the completion of a write request is communicated as soon as the data is in cache, with writes to disk occurring at different times. Write-back is faster than write-through, but risks a system failure before the data is safely written to disk.

write-through mode A caching technique in which the completion of a write request is communicated only after data is written to disk. This is almost equivalent to non-cached systems, but with data protection.

146 EMC® VPLEX™ Administration Guide DRAFT

A addresses of hardware components 122 C cabling, internal VPLEX VS1 dual-engine configuration 130 VPLEX VS1 quad-engine configuration 126 VPLEX VS1 single-engine configuration 134 consistency groups asynchronous 63 synchronous 61 global visibility 62 local visibility 61 I IP addresses 122 R RecoverPoint configurations 107 terminology and concepts 105 VPLEX Local and RecoverPoint CDP 108 VPLEX Local and RecoverPoint CRR/CLR 109 VPLEX Metro and RecoverPoint CDP 110 VPLEX Metro and RecoverPoint CRR/CLR 111 V VPLEX Witness deploment 25

1