SteelFusion™ Design Guide

Version 5.0

April 2017

© 2017 Riverbed Technology, Inc. All rights reserved.

Riverbed and any Riverbed product or service name or logo used herein are trademarks of Riverbed. All other trademarks used herein belong to their respective owners. The trademarks and logos displayed herein cannot be used without the prior written consent of Riverbed or their respective owners.

Akamai® and the Akamai wave logo are registered trademarks of Akamai Technologies, Inc. SureRoute is a service mark of Akamai. Apple and Mac are registered trademarks of Apple, Incorporated in the United States and in other countries. Cisco is a registered trademark of Cisco Systems, Inc. and its affiliates in the United States and in other countries. EMC, Symmetrix, and SRDF are registered trademarks of EMC Corporation and its affiliates in the United States and in other countries. IBM, iSeries, and AS/400 are registered trademarks of IBM Corporation and its affiliates in the United States and in other countries. Juniper Networks and Junos are registered trademarks of Juniper Networks, Incorporated in the United States and other countries. is a trademark of in the United States and in other countries. Microsoft, Windows, Vista, Outlook, and Internet Explorer are trademarks or registered trademarks of Microsoft Corporation in the United States and in other countries. Oracle and JInitiator are trademarks or registered trademarks of Oracle Corporation in the United States and in other countries. is a registered trademark in the United States and in other countries, exclusively licensed through X/ Open Company, Ltd. VMware, ESX, ESXi are trademarks or registered trademarks of VMware, Inc. in the United States and in other countries.

This product includes Windows Azure Linux Agent developed by the Microsoft Corporation (http://www.microsoft.com/). Copyright 2016 Microsoft Corporation.

This product includes software developed by the University of California, Berkeley (and its contributors), EMC, and Comtech AHA Corporation. This product is derived from the RSA Data Security, Inc. MD5 Message-Digest Algorithm.

The SteelHead Mobile Controller (virtual edition) includes VMware Tools. Portions Copyright © 1998-2016 VMware, Inc. All Rights Reserved.

NetApp Manageability Software Development Kit (NM SDK), including any third-party software available for review with such SDK which can be found at http://communities.netapp.com/docs/DOC-1152, and are included in a NOTICES file included within the downloaded files.

For a list of open source software (including libraries) used in the development of this software along with associated copyright and license agreements, see the Riverbed Support site at https//support.riverbed.com.

This documentation is furnished “AS IS” and is subject to change without notice and should not be construed as a commitment by Riverbed. This documentation may not be copied, modified or distributed without the express authorization of Riverbed and may be used only in connection with Riverbed products and services. Use, duplication, reproduction, release, modification, disclosure or transfer of this documentation is restricted in accordance with the Federal Acquisition Regulations as applied to civilian agencies and the Defense Federal Acquisition Regulation Supplement as applied to military agencies. This documentation qualifies as “commercial computer software documentation” and any use by the government shall be governed solely by these terms. All other use is prohibited. Riverbed assumes no responsibility or liability for any errors or inaccuracies that may appear in this documentation.

Riverbed Technology 680 Folsom Street San Francisco, CA 94107 Part Number www.riverbed.com 712-00079-11 Contents

Welcome ...... 11 About this guide ...... 11 Audience ...... 11 Document conventions...... 12 Documentation and release notes ...... 12 Contacting Riverbed ...... 12 What is new ...... 13

1 - Overview of Core and Edge as a System...... 15 Introducing branch converged infrastructure ...... 15 How the SteelFusion product family works...... 16 System components and their roles ...... 18 Blockstore prefetch and prepopulation ...... 20 Related information ...... 21

2 - Deploying Core and Edge as a System ...... 23 The SteelFusion family deployment process...... 23 Provisioning LUNs on the storage array ...... 24 Installing the SteelFusion appliances...... 24 LUN pinning and prepopulation in the Core ...... 25 Configuring snapshot and data protection functionality...... 26 Managing vSphere datastores on LUNs presented by Core...... 26 Single-appliance versus high-availability deployments...... 26 Single-appliance deployment...... 27 High-availability deployment ...... 27 Connecting Core with Edge...... 28 Prerequisites...... 28 Connecting the SteelFusion product family components...... 28 Adding Edges to the Core configuration...... 30 Configuring Edge ...... 30 Mapping LUNs to Edges ...... 30 Riverbed Turbo Boot ...... 33

SteelFusion Design Guide 3 Contents

Related information ...... 33

3 - Deploying the Core ...... 35 Core Dashboard overview ...... 35 Core deployment process overview ...... 35 Interface and port configuration...... 36 Core ports ...... 36 Configuring interface routing...... 38 Configuring Core for jumbo frames ...... 41 Configuring the iSCSI initiator ...... 42 Configuring LUNs...... 42 Exposing LUNs ...... 43 Resizing LUNs ...... 43 Configuring Fibre Channel LUNs...... 43 Removing a LUN from a Core configuration...... 44 Configuring redundant connectivity with MPIO...... 44 MPIO in Core...... 45 Configuring Core MPIO interfaces...... 45 Core pool management...... 45 Overview of Core pool management...... 46 Pool management architecture ...... 46 Configuring pool management...... 47 Changing pool management structure...... 50 High availability in pool management...... 51 Cloud storage gateway support ...... 51 Related information ...... 52

4 - SteelFusion and Fibre Channel...... 55 Overview of Fibre Channel ...... 55 Fibre Channel LUN considerations ...... 57 How VMware ESXi virtualizes Fibre Channel LUNs...... 57 How Core-v connects to RDM Fibre Channel LUNs ...... 58 Requirements for Core-v and Fibre Channel SANs...... 59 Specifics about Fibre Channel LUNs versus iSCSI LUNs ...... 60 Deploying Fibre Channel LUNs on Core-v appliances...... 60 Deployment prerequisites ...... 60 Configuring Fibre Channel LUNs...... 61 Configuring Fibre Channel LUNs in a Core-v HA scenario ...... 64 When ESXi servers hosting the Core-v appliances are managed by vCenter...... 64 When ESXi servers hosting the Core-v appliances are not managed by vCenter...... 66 Populating Fibre Channel LUNs ...... 66 Best practices for deploying Fibre Channel on Core-v...... 68 Best practices...... 68

4 SteelFusion Design Guide Contents

Recommendations ...... 69 Troubleshooting...... 69 Related information ...... 70

5 - Configuring the Edge in iSCSI Mode...... 73 SteelFusion Edge appliance architecture ...... 73 Interface and port configurations ...... 74 Edge appliances ports...... 75 Moving the Edge to a new location ...... 76 Configuring Edge for jumbo frames ...... 76 Configuring iSCSI initiator timeouts ...... 77 Configuring disk management on the Edge appliance ...... 77 Configuring SteelFusion storage...... 78 MPIO in Edge ...... 79 Related information ...... 79

6 - SteelFusion Appliance High-Availability Deployment ...... 81 Overview of storage availability ...... 81 Core high availability ...... 82 Core with MPIO...... 83 Core HA concepts...... 83 Configuring HA for Core ...... 84 Edge high availability ...... 94 Recovering from split-brain scenarios involving Edge HA ...... 102 Testing HA failover deployments...... 103 Configuring WAN redundancy ...... 103 Configuring WAN redundancy with no Core HA ...... 103 Configuring WAN redundancy in an HA environment...... 105 Related information ...... 105

7 - SteelFusion Replication (FusionSync)...... 107 Overview of SteelFusion replication...... 107 Architecture of SteelFusion replication ...... 107 SteelFusion replication components...... 108 SteelFusion replication design overview ...... 108 Failover scenarios ...... 111 Secondary site is down...... 111 Replication is suspended at the secondary site...... 112 Primary site is down (suspended)...... 113 FusionSync high-availability considerations ...... 113

SteelFusion Design Guide 5 Contents

FusionSync and Core HA ...... 114 Replication HA failover scenarios...... 114 SteelFusion replication metrics...... 116 Journal LUN metrics ...... 117 Related information ...... 118

8 - Data Protection and Snapshots...... 119 Introduction ...... 120 Data protection ...... 120 Setting up application-consistent snapshots...... 121 Configuring snapshots for LUNs ...... 122 Volume Snapshot Service support...... 123 Implementing Riverbed Host Tools for snapshot support...... 123 Overview of RHSP and VSS ...... 123 Riverbed Host Tools installation and configuration...... 124 Configuring the proxy host for backup ...... 124 Configuring the storage array for proxy backup...... 125 Server-level backups...... 126 Migrating from LUN protection to server-level backup...... 129 Protection groups and policy customization...... 129 Data recovery ...... 131 Branch recovery ...... 132 Overview of branch recovery...... 132 How branch recovery works...... 133 Branch recovery configuration...... 133 Branch recovery CLI configuration example...... 134 Related information ...... 136

9 - Data Resilience and Security...... 137 Recovering a single Core...... 137 Recovering a single physical Core...... 138 Recovering a single Core-v ...... 138 Edge replacement ...... 139 Disaster recovery scenarios ...... 140 SteelFusion appliance failure—failover ...... 140 SteelFusion appliance failure—failback ...... 142 Best practice for LUN snapshot rollback ...... 144 Using CHAP to secure iSCSI connectivity...... 144 One-way CHAP ...... 145 Mutual CHAP...... 145 At-rest and in-flight data security ...... 146

6 SteelFusion Design Guide Contents

Enable data at-rest blockstore encryption ...... 147 Enable data in-flight secure peering encryption...... 149 Clearing the blockstore contents...... 149 Edge network communication ...... 150 Additional security best practices ...... 150 Related information ...... 151

10 - SteelFusion Appliance Upgrade ...... 153 Planning software upgrades...... 153 Upgrade sequence...... 154 Minimize risk during upgrading ...... 154 Performing the upgrade...... 155 Edge upgrade ...... 155 Core upgrade...... 156 Related information ...... 156

11 - Network Quality of Service ...... 157 Rdisk protocol overview...... 157 QoS for SteelFusion replication traffic...... 159 QoS for LUNs ...... 159 QoS for unpinned LUNs ...... 159 QoS for pinned LUNs ...... 159 QoS for branch offices ...... 159 QoS for branch offices that mainly read data from the data center...... 159 QoS for branch offices booting virtual machines from the data center ...... 160 Time-based QoS rules example ...... 160 Related information ...... 160

12 - Deployment Best Practices ...... 161 Edge best practices...... 161 Segregate traffic ...... 162 Pin the LUN and prepopulate the blockstore ...... 162 Segregate data onto multiple LUNs ...... 162 Ports and type of traffic ...... 163 iSCSI port bindings...... 163 Changing IP addresses on the Edge, ESXi host, and servers...... 163 Disk management ...... 164 Rdisk traffic routing options...... 165 Deploying SteelFusion with third-party traffic optimization ...... 165 Windows and ESX server storage layout—SteelFusion-protected LUNs vs. local LUNs . 166 VMFS datastores deployment on SteelFusion LUNs ...... 168 Enable Windows persistent bindings for mounted iSCSI LUNs ...... 169

SteelFusion Design Guide 7 Contents

Set up memory reservation for VMs running on VMware ESXi in the VSP...... 170 Boot from an unpinned iSCSI LUN ...... 171 Running antivirus software...... 171 Running disk defragmentation software...... 171 Running backup software...... 171 Configure jumbo frames ...... 172 Removing Core from Edge and re-adding Core...... 172 Core best practices ...... 172 Deploy on gigabit Ethernet networks...... 173 Use CHAP ...... 173 Configure initiators and storage groups or LUN masking ...... 173 Core hostname and IP address ...... 173 Segregate storage traffic from management traffic...... 174 When to pin and prepopulate the LUN ...... 174 Core configuration export...... 175 Core in HA configuration replacement ...... 175 LUN-based data protection limits ...... 175 WAN usage consumption for a Core to Edge VMDK data migration ...... 175 Reserve memory and CPU resources when deploying Core-v...... 176 iSCSI initiators timeouts...... 176 Microsoft iSCSI initiator timeouts ...... 176 ESX iSCSI initiator timeouts ...... 176 patching...... 176 Patching at the branch office for virtual servers installed on iSCSI LUNs...... 177 Patching at the data center for virtual servers installed on iSCSI LUNs ...... 177 Related information ...... 177

13 - SteelFusion and NFS ...... 179 Introduction to SteelFusion with NFS...... 179 Existing SteelFusion features available with NFS/file deployments ...... 181 Unsupported SteelFusion features with NFS/file deployments ...... 181 Basic design scenarios...... 182 Basic configuration deployment principles...... 182 SteelFusion Core with NFS/file deployment process overview...... 183 SteelFusion Core interface and port configuration ...... 183 SteelFusion Edge with NFS - appliance architecture...... 185 Virtual Services Platform installation with NFS/file mode...... 185 SteelFusion Edge interface and port configuration...... 186 Overview of high availability with NFS ...... 190 SteelFusion Core high availability with NFS ...... 191 SteelFusion Edge high availability with NFS...... 192 Snapshots and backup...... 194 Best practices ...... 196 Core with NFS - best practices ...... 197

8 SteelFusion Design Guide Contents

Edge with NFS - best practices ...... 199

14 - SteelFusion Appliance Sizing ...... 201 General sizing considerations ...... 201 Core sizing guidelines...... 201 Edge sizing guidelines ...... 203

A - Edge Network Reference Architecture ...... 205 Edge network card interfaces...... 205

SteelFusion Design Guide 9 Contents

10 SteelFusion Design Guide Welcome

About this guide Welcome to the SteelFusion Design Guide. This guide provides an overview and best practices of the SteelFusion Core and Edge appliances. It discusses how to configure them as a system.

Audience

This guide is written for storage, , and network administrators familiar with administering and managing storage arrays, snapshots, backups, and virtual machines (VMs), Fibre Channel, and iSCSI. This guide includes information relevant to the following products and product features:

 Riverbed SteelFusion Core (Core)

 Riverbed SteelFusion Core Virtual Edition (Core-v)

 Riverbed SteelFusion Edge (Edge)

 Riverbed Optimization System (RiOS)

 Riverbed SteelHead (SteelHead)

 Riverbed SteelCentral Controller for SteelHead (SCC or Controller)

 Riverbed Virtual Services Platform (VSP)

 Virtualization technology

Note: All SteelHead EX information has been removed. For information on deploying SteelHead EX with SteelFusion, see the SteelFusion Deployment Guide, April 2015 or earlier.

This guide is intended to be used together with the documentation and technical notes available at https://support.riverbed.com. For example:

 the SteelFusion Core Management Console User’s Guide

 the SteelFusion Edge Management Console User’s Guide

 the SteelFusion Edge Hardware and Maintenance Guide

 the Riverbed Command-Line Interface Reference Manual

 the SteelFusion Command-Line Interface Reference Manual

 the SteelFusion Edge Installation and Configuration Guide

SteelFusion Design Guide 11 Welcome Documentation and release notes

 the SteelFusion Core Installation and Configuration Guide

Document conventions

This guide uses the following standard set of typographical conventions.

Convention Meaning

italics Within text, new terms and emphasized words appear in italic typeface.

boldface Within text, CLI commands, CLI parameters, and REST API properties appear in bold typeface. Courier Code examples appear in Courier font: amnesiac > enable amnesiac # configure terminal

< > Values that you specify appear in angle brackets: interface [ ] Optional keywords or variables appear in brackets: ntp peer [version ] { } Elements that are part of a required choice appear in braces: { | ascii | hex } | The pipe symbol separates alternative, mutually exclusive elements of a choice. The pipe symbol is used in conjunction with braces or brackets; the braces or brackets group the choices and identify them as required or optional: {delete | upload }

Documentation and release notes The most current version of all Riverbed documentation can be found on the Riverbed Support site at https://support.riverbed.com. See the Riverbed Knowledge Base for any known issues, how-to documents, system requirements, and common error messages. You can browse titles or search for keywords and strings. To access the Riverbed Knowledge Base, log in to the Riverbed Support site at https://support.riverbed.com. Each software release includes release notes. The release notes list new features, known issues, and fixed problems. To obtain the most current version of the release notes, go to the Software and Documentation section of the Riverbed Support site at http://www.riverbed.com/services/index.html. Examine the release notes before you begin the installation and configuration process.

Contacting Riverbed This section describes how to contact departments within Riverbed.

 Technical support - Problems installing, using, or replacing Riverbed products? Contact Riverbed Support or your channel partner who provides support. To contact Riverbed Support, open a trouble ticket by calling 1-888-RVBD-TAC (1-888-782-3822) in the United States and Canada or +1 415-247-7381 outside the United States. You can also go to https://support.riverbed.com.

 Professional services - Need help with planning a migration or implementing a custom design solution? Contact Riverbed Professional Services. Email [email protected] or go to http:// www.riverbed.com/services/index.html.

12 SteelFusion Design Guide What is new Welcome

 Documentation - Have suggestions about Riverbed’s online documentation or printed materials? Send comments to [email protected].

What is new Since the SteelFusion Design Guide, December 2016, the following information has been added or updated:

 Updated with information on the new NFS feature:

 Chapter 1, “Overview of Core and Edge as a System”

 Chapter 5, “Configuring the Edge in iSCSI Mode”

 Upgrade sequence - Update to the recommended sequence when upgrading a Core and an Edge as part of the same procedure. For details, see Chapter 10, “SteelFusion Appliance Upgrade.”

 Updated deployment best practices - See Chapter 12, “Deployment Best Practices.”

 “Removing Core from Edge and re-adding Core” on page 172

 “WAN usage consumption for a Core to Edge VMDK data migration” on page 175

 “LUN-based data protection limits” on page 175

 New chapter on SteelFusion and NFS mode - Version 5.0 augments SteelFusion storage protocol support with NFS. End-to-end native NFS support enables optimization of network file systems across SteelFusion. When teamed with SteelFusion Core, Edge can now present NFS version 3 storage exports projected from the data center as local exports to applications and servers at the local branch. For details, see Chapter 13, “SteelFusion and NFS.”

 Core specification enforcement - Core version 5.0 and later include support for enforcement of Core specifications. For details, see “Core sizing guidelines” on page 201.

SteelFusion Design Guide 13 Welcome What is new

14 SteelFusion Design Guide 1

Overview of Core and Edge as a System

This chapter describes the Core and Edge components as a virtual storage system. It includes the following sections:

 “Introducing branch converged infrastructure” on page 15

 “How the SteelFusion product family works” on page 16

 “System components and their roles” on page 18

 “Blockstore prefetch and prepopulation” on page 20

 “Related information” on page 21

Introducing branch converged infrastructure SteelFusion is a converged infrastructure solution, encompassing all branch services such as server, storage, networking, and WAN optimization. It is a dual-ended system that comprises two logical components: SteelFusion Edge and SteelFusion Core. In a basic deployment scenario, the two components work together in a topology where a single Core appliance can support multiple Edge devices. As a dual-ended system, they can be deployed in one of two storage delivery modes, either operating with block storage as part of a Storage Area Network (SAN) or with Network Attached Storage (NAS) by using the Network File System (NFS) protocol. It is not possible to have the same Edge or Core operate with both storage modes.

Note: When configured in an NFS/file deployment, SteelFusion is designed to support VMware vSphere datastores only. SteelFusion is not providing generic NFS file server access or operating as a global fileshare. Not all Core and Edge models are able to support NFS/file mode. See the release notes for the latest information regarding supported models.

Core is a physical or virtual appliance in the data center that mounts all storage (LUNs within a SAN or exported NAS fileshares via NFS) from the backend storage array or file server that needs to be made available to applications and servers at a remote location. In the remote location, Edge provides a virtualized environment that hosts the branch application servers. Core appliances communicate across the WAN with the Edge appliances at the branch. SteelFusion delivers local user performance while enabling data centralization, instant recovery, and lower total operating costs. Unlike traditional converged infrastructures, SteelFusion enables stateless branch services. You can access applications that run locally in your branch while the primary data is centralized in your data center. Decoupling computation from its underlying storage allows your applications to run in a stateless mode, which reduces your branch footprint and centralizes management of your branch services.

SteelFusion Design Guide 15 Overview of Core and Edge as a System How the SteelFusion product family works

With the SteelFusion product family, data center administrators can extend data center storage to a remote location, even over a low-bandwidth link. SteelFusion delivers business agility, enabling you to effectively deliver global storage infrastructure anywhere you need it. SteelFusion provides the following functionality:

 Innovative block storage optimization ensures that you can centrally manage data storage while keeping that data available to business operations in the branch, even in the event of a WAN outage.

 A local authoritative cache ensures LAN-speed reads and fast cold writes at the branch.

 Integration with Microsoft Volume Shadow Copy Service enables consistent point-in-time data snapshots and seamless integration with backup applications.

 Integration with the snapshot capabilities of the storage array enables you to configure application-consistent snapshots through the Core Management Console.

 Integration with industry-standard Challenge-Handshake Authentication Protocol (CHAP) authenticates users and hosts (iSCSI/block mode only).

 A secure vault protects sensitive information using AES 256-bit encryption.

 Solid-state disks (SSDs) that guarantee data durability and performance.

 An active-active high-availability (HA) deployment option for SteelFusion ensures the availability of storage from SAN or NAS for remote sites.

 Customizable reports provide visibility to key utilization, performance, and diagnostic information. By consolidating all storage at the data center and creating diskless branches, SteelFusion eliminates data sprawl, costly data replication, and the risk of data loss at the branch office.

How the SteelFusion product family works The SteelFusion product family is designed to simplify infrastructure in remote offices and branch offices and manage it centrally from a data center. SteelFusion can operate in either of two storage delivery modes. One mode uses iSCSI or Fibre Channel protocols to interface with block storage arrays in the data center and iSCSI to host servers at the branch office. The other mode uses the NFS protocol to interface with file servers at the data center and host servers at the branch. For details about how SteelFusion works within a deployment using NFS, see Chapter 13, “SteelFusion and NFS.” The SteelFusion product family is typically deployed in conjunction with SteelHeads and includes the following components:

 Core - Core is a physical or virtual appliance deployed in the data center alongside SteelHeads and the centralized storage array. Core mounts iSCSI LUNs (in iSCSI/block mode) or NFS exports (in NFS/file mode) provisioned for the branch offices. Additionally, Core-v can mount LUNs through Fibre Channel. When deployed in an NFS configuration, Core mounts fileshares that are exported from the centralized file server.

 Edge - Edge refers to the branch component of the SteelFusion solution. The Edge hosts two distinct functions, or nodes: – WAN optimization – Hypervisor platform with VMware vSphere

16 SteelFusion Design Guide How the SteelFusion product family works Overview of Core and Edge as a System

When deployed in an iSCSI configuration, Edge presents itself to application servers in the branch as an iSCSI storage portal. From the portal, the application server uses iSCSI to mount the iSCSI LUNs that are projected across the WAN from the data center. Edge can also host local LUNs or local exports for use as temporary storage that are not projected from the data center: for example, temporary or local copies of software repositories. When deployed in an NFS configuration, Edge presents itself as an NFS file server enabling a VMware vSphere server in the branch to mount exported fileshares to be used as datastores.

Note: With SteelFusion version 5.0 and later, comes support for a Virtual Edge. However, it does not include a hypervisor platform itself. Virtual Edge is a software-defined edge solution that extends SteelFusion to third-party commodity hardware. For more information about Virtual Edge, see the Virtual SteelFusion Edge System Integrator’s Guide.

For a list of qualified software and storage systems, we strongly recommend that you read the SteelFusion Interoperability Matrix at https://splash.riverbed.com/docs/DOC-4204. For information on compatibility between SteelHead software (RiOS), Edge, Core, and vSphere releases, see the Riverbed Knowledge Base article RiOS, SteelFusion Edge, SteelFusion Core and vSphere Release Matrix at https://supportkb.riverbed.com/support/index?page=content&id=S27472. The Edge connects to the blockstore, a persistent local cache of storage blocks located inside the Edge itself. SteelFusion initially populates the blockstore using the following methods:

 First request - Data is added to the blockstore when first requested. Because the first request is cold, it is subject to standard WAN latency. Subsequent traffic is optimized.

 On-demand prefetch - The system observes block requests, applies heuristics based on these observations to intelligently predict the data most likely to be requested in the near future, and then requests that data from the data center LUN/export in advance.

 Policy-based prefetch - Configured policies identify the blocks that are likely to be requested at a given branch office site in advance; the Edge then requests that data from the data center LUN/ export in advance. For details on blockstore population, see “Blockstore prefetch and prepopulation” on page 20.

SteelFusion Design Guide 17 Overview of Core and Edge as a System System components and their roles

System components and their roles Figure 1-1 shows the various components in a basic SteelFusion iSCSI/block/Fibre Channel deployment. The generic SteelFusion NFS/file deployment is shown in Figure 1-2.

Figure 1-1. Generic SteelFusion block storage deployment

Figure 1-2. Generic SteelFusion NFS/file deployment

 Branch server - One or more branch-side servers that access data from the SteelFusion system instead of a local storage device. These servers can also run as virtual machines (VMs) within the Edge hypervisor node.

 Blockstore - A persistent local cache of storage blocks. Because each Edge is linked to one or more dedicated LUNs at the data center, the blockstore is authoritative for both reads and writes. In Figure 1-1, the blockstore on the branch side synchronizes with its LUN(s) at the data center. In Figure 1-2, the blockstore on the branch side synchronizes with its exported fileshare(s) at the data center.

 iSCSI initiator (iSCSI/block mode only) - The branch-side server that sends SCSI commands to its iSCSI target that is the Edge in the branch. At the data center, the Core is an iSCSI initiator that sends SCSI commands to access LUNs through an iSCSI target in the storage array.

Note: Although not shown in Figure 1-1, Core can also use Fibre Channel to the storage array.

 NFS client - One or more branch-side VMware that use NFS to mount exported datastores from the NFS file server (the Edge in the branch). The VMware hypervisor located within the Edge hypervisor node can also act as an NFS client to mount its ESXi datastore. At the data center, the Core is an NFS client that uses NFS to access fileshares exported by the NFS file server in the data center.

18 SteelFusion Design Guide System components and their roles Overview of Core and Edge as a System

 Edge - The branch-side component of the SteelFusion system that links the blockstore through the Core to the storage (LUN or fileshare, depending on the deployment scenario) at the data center. The Edge also includes SteelHead functionality to provide WAN optimization services.

 Data center SteelHead - The data center-side SteelHead peer for WAN optimization.

 Core - The data center component of the SteelFusion product family. The Core manages block transfers between the LUN and the Edge when deployed as part of a block storage implementation. When deployed in an NFS/file scenario, the Core manages NFS transfers between the exported fileshare and the Edge.

 iSCSI target (iSCSI/block mode only) - In the branch office, it’s the Edge that communicates with the branch-side iSCSI initiator in the branch server. In the data center, it’s the storage array that communicates with the Core iSCSI initiator.

 LUN (iSCSI/block mode only) - A unit of block storage deployed from the storage array and projected through the Core to the Edge. More than one LUN can be projected to an Edge.

 NFS file server - In the branch office, it’s the Edge that communicates with the branch-side NFS client in the branch VMware hypervisor. In the data center, it’s the file server that communicates with the Core NFS client.

 Fileshare - A unit of NFS storage that is exported from the NFS file server and projected through the Core to the Edge. More than one fileshare export can be projected to an Edge. At the data center, Core integrates with existing storage systems, virtual infrastructure, and SteelHead deployments. Depending on the deployment (block storage or NFS), Core connects dedicated LUNs or fileshare exports with each Edge appliance at the branch office. The branch office server connects to Edge, which implements handlers for the iSCSI protocol in a block storage deployment, or requests for the NFS protocol in an NFS/file deployment. The Edge also connects to the blockstore, a persistent local cache of storage blocks within the Edge itself. The blockstore is the Edge authoritative persistent cache of storage blocks. The blockstore is local from a branch perspective and holds data from all the LUNs or exported fileshares (depending on the deployment mode) available through a specific Edge. The blockstore is authoritative because it includes the latest-written data before it is sent through the Core to a storage array or file server at the data center. When a server at the branch office requests data, that data is served locally from the blockstore if the data is currently present. If it is not present, the Edge retrieves it through the Core from the data center LUN or exported fileshare. Similarly, newly written data is spooled to the local blockstore, acknowledged by the Edge to the branch office server, and then asynchronously propagated to the data center. Because each Edge implementation is linked to one or more dedicated LUNs or exported fileshares at the data center, the blockstore is authoritative for both reads and writes and can tolerate WAN outages without affecting cache coherency. Blocks are transferred between the Edge and the Core through an internal protocol. The Core then writes the updates to the data center LUNs through the iSCSI or Fibre Channel Protocol (FCP). The same internal protocol is used to transfer NFS data between Edge and Core where the Core then writes the updates to the fileshares exported from the data center fileserver using the NFS protocol. SteelFusion is designed to be coupled with the SteelHead WAN optimization. You can further optimize traffic between the branch offices and the data center by implementing SteelHeads. For more information about Fibre Channel, see Chapter 4, “SteelFusion and Fibre Channel.” For more information about NFS/file deployments, see Chapter 13, “SteelFusion and NFS.”

SteelFusion Design Guide 19 Overview of Core and Edge as a System Blockstore prefetch and prepopulation

The data cache in the blockstore is stored as-is, and it is not deduplicated. Edge appliances include the SteelHead, and in the data center, the Cores are coupled with SteelHead products, which assist with data reduction and streamlining between the Edge and the Core. You can encrypt the blockstore cache at rest using AES 128/192/256-bit encryption. This encryption eliminates the risk if your appliances are stolen. Similarly, because SteelFusion enables the removal of physical tape media and backup devices traditionally used for data protection from the remote offices, this encryption also eliminates risk of data theft. As a result, the blockstore eliminates the need for separate block storage facilities at the branch office and all the associated maintenance, tools, backup services, hardware, service resources, and so on. For more information about blockstore encryption, see “At-rest and in-flight data security” on page 146.

Blockstore prefetch and prepopulation One of the drawbacks of block storage protocols like iSCSI or Fibre Channel communicating across a wide area network (WAN) is that the subsequent requests for further data blocks may be non- sequential to the point that they seem random. This randomness is by design and a facet of the backend storage, but it makes rapid data delivery across a WAN using a block storage protocol difficult due to the high-latency adding a long turnaround time between request and response. One way to mitigate this effect is for the sending side to be able to predict in some way what the subsequent requests will be so that data can be sent without waiting for the request. However, this is not possible with traditional storage protocols. The SteelFusion architecture is appropriate for this type of approach because data on the Edge is held locally in the blockstore. Blockstore is the persistent local cache of storage blocks linked to one or more dedicated LUNs at the data center. As long as the data blocks that the Edge wants to read are already in blockstore, no request needs to be transmitted across the WAN through the Core to backend storage. Instead, the Edge responds to the read request serving data at local disk speed. Since the Edge blockstore benefits from a three-tier architecture of memory, solid-state disk (SSD), and (HDD), the read response is faster than could be achieved by storage in traditional branch servers. But if the required data is not in blockstore (called a blockstore miss), the Edge requests the data from the backend storage by asking the Core. The Core understands specific blockstore formats and can send the requested data and also continue to proactively send across additional blocks of data to the blockstore that it predicts the Edge may need. If the prediction is successful, the Edge will then find the subsequent data it needs locally in the blockstore and therefore does not need to submit further requests to the Core. Depending on the scenario, populating the Edge blockstore can be performed by one or more of these methods:

 Prefetch - The act of pushing data out from the Core in response to a blockstore miss at the Edge. Not only does the Core push out the requested blocks, it also sends across additional blocks that it predicts may be required by the Edge on subsequent reads. Note that prefetch is reactive (or, on- demand) in that it operates in response to the Edge. Intelligence within the Core means that prefetch works well with LUNs that are either VMFS or NTFS format, or Windows VMs on NFS.

Note: For prefetch of NFS exports with Windows VMs, you must install the Riverbed Turbo Boot Plugin. For details, see “Riverbed Turbo Boot” on page 33.

20 SteelFusion Design Guide Related information Overview of Core and Edge as a System

 Policy-based prefetch - Not a specific prefetch technique, but a term used to describe a selection of different “proactive” methods designed to populate the Edge blockstore. These methods are all types of prepopulation.

 Prepopulation - Sometimes referred to as full prepopulation. Designed to be used for pinned LUNs or exports. Prepopulation is the act of proactively sending data without a request from the Edge. This method is used for any type of pinned LUN or export, and is very beneficial for LUNs that are not VMFS or NTFS format (for example, FAT32, ext3, ext4) and NFS exports with non-Windows VMs.

Note: Prepopulation does not work with unpinned LUNs or exports. Unpinned LUNs or exports are populated dynamically (incorporating prefetch) as the Edge begins using them.

 Branch recovery (iSCSI/block mode only) - A type of prepopulation that enables the Core to proactively send across the working set of data blocks. This feature only applies to NTFS or VMFS LUNs and uses a service called the Branch Recovery Agent. For more information about this feature, see “Branch recovery” on page 132.

 Smart prepopulation (iSCSI/block mode only) - Sometimes referred to as intelligent prepopulation. A type of prepopulation where the Core proactively sends all used blocks in a volume with no regards to the working set blocks. Smart prepopulation is only required if your pinned LUN is VMFS or NTFS. For more information about how to enable or configure the different types of prepopulation, see the SteelFusion Command-Line Interface Reference Manual.

Related information

 SteelFusion Core Management Console User’s Guide

 SteelFusion Edge Management Console User’s Guide

 Riverbed Splash at https://splash.riverbed.com/community/product-lines/steelfusion

 SteelFusion Interoperability Matrix at https://splash.riverbed.com/docs/DOC-4204

 Riverbed Knowledge Base article RiOS, SteelFusion Edge, SteelFusion Core and vSphere Release Matrix at https://supportkb.riverbed.com/support/index?page=content&id=S:S27472

SteelFusion Design Guide 21 Overview of Core and Edge as a System Related information

22 SteelFusion Design Guide 2

Deploying Core and Edge as a System

This chapter describes the process and procedures for deploying the SteelFusion product family at both the branch office and the data center. This chapter is a general introduction to one of the possible scenarios to form a basic, but typical, SteelFusion deployment. Further details on specific stages of deployment, such as a Core and Edge configuration, high availability, configuration scenarios for snapshots, and so on, are covered in following chapters of this guide.

Note: This chapter focuses on the deployment of SteelFusion block storage mode with iSCSI or Fibre Channel- connected storage. For deployments using SteelFusion in NFS/file mode, see Chapter 13, “SteelFusion and NFS.”

This chapter includes the following sections:

 “The SteelFusion family deployment process” on page 23

 “Single-appliance versus high-availability deployments” on page 26

 “Connecting Core with Edge” on page 28

 “Riverbed Turbo Boot” on page 33

 “Related information” on page 33

The SteelFusion family deployment process This section provides a broad outline of the process for deploying the SteelFusion product family. Depending on the type of deployment and products involved (for example, with or without redundancy, iSCSI or Fibre Channel connected storage, and so on), the details of certain steps can vary. Use the outline below to create a deployment plan that is specific for your requirements. The tasks are listed in approximate order; dependencies are listed when required. The tasks are as follows:

 “Provisioning LUNs on the storage array” on page 24

 “Installing the SteelFusion appliances” on page 24

 “LUN pinning and prepopulation in the Core” on page 25

 “Configuring snapshot and data protection functionality” on page 26

 “Managing vSphere datastores on LUNs presented by Core” on page 26

SteelFusion Design Guide 23 Deploying Core and Edge as a System The SteelFusion family deployment process

Provisioning LUNs on the storage array

This section describes how to provision LUNs on the storage array. For more information, see the documentation for your storage array.

To provision LUNs on the storage array

1. Enable the connections for the type of LUNs you intend to expose to the branch: for example, iSCSI and Fibre Channel.

2. Determine the LUNs you want to dedicate to specific branches.

Note: Step 3 and Step 4 are optional. The LUNs to be exposed to the branch can be empty and populated later. For example, if you require the LUNs to be preloaded with virtual machine images as part of the ESX datastore, you only need to perform Step 3 and Step 4 if you want to preload the LUNs with data.

3. Connect to a temporary ESX server to deploy virtual machines (VMs) for branch services (including the branch Windows server) to the LUNs. We recommend that you install the optional Windows Server plug-ins at this point. This installation is useful if you use the Boot over WAN functionality available for Windows 2008 and Windows 2012. For details, see “Implementing Riverbed Host Tools for snapshot support” on page 123.

4. After you deploy the VMs, disconnect from the temporary ESX server.

5. Create the necessary initiator or target groups.

Installing the SteelFusion appliances

This section describes at a high level how to install and configure Core and Edge appliances. For complete installation procedures, see the SteelFusion Core Installation and Configuration Guide and the SteelFusion Edge Installation and Configuration Guide.

To install and configure Core

1. Install the Core or Core-v in the data center network.

2. Connect the Core appliance to the storage array.

3. Through the Core, discover and configure the desired LUNs on the storage array.

4. (Recommended) Enable and configure HA.

5. (Recommended) Enable and configure multipath I/O (MPIO) for iSCSI connected storage. If you have decided to use MPIO, you must configure it at two separate and independent points:

 iSCSI initiator

 iSCSI target If you are using Fibre Channel connected LUNs, make sure you enable multiple paths on the vSphere host on which the Core-v is deployed.

24 SteelFusion Design Guide The SteelFusion family deployment process Deploying Core and Edge as a System

Additional steps are required on the Edge to complete a typical installation. A high-level series of steps is shown in the following procedure.

To install and configure the Edge appliance

1. Install the Edge in the branch office network.

2. On the appliance, configure disk management to enable SteelFusion storage mode.

3. Preconfigure the Edge for connection to Core.

4. Connect the Edge and the Core.

LUN pinning and prepopulation in the Core

LUN pinning and prepopulation are two separate features configured through the Core that together determine how block data is kept in the blockstore on the Edge. When you pin a LUN in the Core configuration, you reserve space in the Edge blockstore that is equal in size to the LUN at the storage array. Furthermore, when blocks are fetched by the Edge, they remain in the blockstore in their entirety; by contrast, blocks in unpinned LUNs might be cleared on a first-in, first-out basis. Pinning only reserves blockstore space; it does not populate that space with blocks. The blockstore is populated as the application server in the branch requests data not yet in the blockstore (causing the Edge to issue a read through the Core), or through prepopulation. The prepopulation functionality enables you to prefetch blocks to the blockstore. You can prepopulate a pinned LUN on the blockstore in one step; however, if the number of blocks is very large, you can configure a prepopulation schedule that prepopulates the blockstore only during specific intervals of your choice: for example, not during business hours. After the prepopulation process is completed, the schedule stops automatically.

Note: Prefetch does not optimize access for VMs that contain any SE SPARSE (Space Efficient Sparse) format snapshots.

For more information about pinning and prepopulation, see the SteelFusion Core Management Console User’s Guide.

To configure pinning and prepopulation

1. Choose Configure > Manage: LUNs to display the LUNs page.

2. Click the LUN configuration to display the configuration settings.

3. Select the Pin/Prepop tab.

4. To pin the LUN, select Pinned from the drop-down list and click Update. When the LUN is pinned, the prepopulation settings are activated for configuration.

SteelFusion Design Guide 25 Deploying Core and Edge as a System Single-appliance versus high-availability deployments

Configuring snapshot and data protection functionality

Core integrates with the snapshot capabilities of the storage array to enable you to configure application-consistent snapshots through the Core Management Console. For details, see “Data Protection and Snapshots” on page 119.

Understanding crash consistency and application consistency In the context of snapshots and backups and data protection in general, two types or states of data consistency are distinguished:

 Crash consistency - A backup or snapshot is crash consistent if all of the interrelated data components are as they were (write-order consistent) at the instant of the crash. This type of consistency is similar to the status of the data on your PC’s hard drive after a power outage or similar event. A crash-consistent backup is usually sufficient for nondatabase operating systems and applications like file servers, DHCP servers, print servers, and so on.

 Application consistency - A backup or snapshot is application consistent if, in addition to being write-order consistent, running applications have completed all their operations and flushed their buffers to disk (application quiescing). Application-consistent backups are recommended for operating systems and applications such as SQL, Oracle, and Exchange. The SteelFusion product family ensures continuous crash consistency at the branch and at the data center by using journaling and by preserving the order of WRITEs across all the exposed LUNs. For application-consistent backups, administrators can directly configure and assign hourly, daily, or weekly snapshot policies on the Core. Edge interacts directly with both VMware ESXi and Microsoft Windows servers, through VMware Tools and Volume Snapshot Service (VSS) to quiesce the applications and generate application-consistent snapshots of both Virtual Machine File System (VMFS) and New Technology File System (NTFS) data drives.

Managing vSphere datastores on LUNs presented by Core

Through the vSphere client, you can view inside the LUN to see the VMs previously loaded in the data center storage array. You can add a server that contains vSphere VMs as a datastore to the ESXi server in the branch. This server can be either a regular hardware platform hosting ESXi or the hypervisor node in the Edge appliance.

Single-appliance versus high-availability deployments This section describes types of SteelFusion appliance deployments. It includes the following topics:

 “Single-appliance deployment” on page 27

 “High-availability deployment” on page 27 This section assumes that you understand the basics of how the SteelFusion product family works together, and you are ready to deploy your appliances.

26 SteelFusion Design Guide Single-appliance versus high-availability deployments Deploying Core and Edge as a System

Single-appliance deployment

In a single-appliance deployment (basic deployment), SteelFusion Core connects to the storage array through a data interface. Depending on the model of the Core, the data interface is named ethX_Y in which X and Y are some numerical value such as eth0_0, eth0_1, and so on. The primary (PRI) interface is dedicated to the traffic VLAN, and the auxiliary (AUX) interface is dedicated to the management VLAN. More complex designs generally use the additional network interfaces. For more information about Core interface names and their possible uses, see “Interface and port configuration” on page 36.

Figure 2-1. Single-appliance deployment

High-availability deployment

In a high-availability (HA) deployment, two Cores operate as failover peers. Both appliances operate independently with their respective and distinct Edges until one fails; then the remaining operational Core handles the traffic for both appliances. For more information about HA, see “SteelFusion Appliance High-Availability Deployment” on page 81.

Figure 2-2. HA deployment

SteelFusion Design Guide 27 Deploying Core and Edge as a System Connecting Core with Edge

Connecting Core with Edge This section describes the prerequisites for configuring the data center (Core) and branch office (Edge) components of the SteelFusion product family, and it provides an overview of the procedures required. It includes the following topics:

 “Prerequisites” on page 28

 “Connecting the SteelFusion product family components” on page 28

 “Adding Edges to the Core configuration” on page 30

 “Configuring Edge” on page 30

 “Mapping LUNs to Edges” on page 30

Prerequisites

Before you configure Core with Edge, ensure that the following tasks have been completed:

 Assign an IP address or hostname to the Core.

 Determine the iSCSI Qualified Name (IQN) to be used for Core. When you configure Core, you set this value in the initiator configuration.

 Set up your storage array: – Register the Core IQN. – Configure iSCSI portal, targets, and LUNs, with the LUNs assigned to the Core IQN.

 Assign an IP address or hostname to the Edge.

Connecting the SteelFusion product family components

The following table summarizes the process for connecting and configuring Core and Edge as a system. You can perform some of the steps in the table in a different order, or even in parallel with each other in some cases. The sequence shown is intended to illustrate a method that enables you to complete one task so that the resources and settings are ready for the next task in the sequence.

Component Procedure Description

Core Determine the network settings Prior to deployment: for Core.  Assign an IP address or hostname to Core.

 Determine the IQN to be used for Core. When you configure Core, you set this value in the initiator configuration. iSCSI-compliant Register the Core IQN. SteelFusion uses the IQN name format for iSCSI storage array initiators. For details about IQN, see http:// tools.ietf.org/html/rfc3720. Prepare the iSCSI portals, Prior to deploying Core, you must prepare these targets, and LUNs, with the components. LUNs assigned to the Core IQN.

28 SteelFusion Design Guide Connecting Core with Edge Deploying Core and Edge as a System

Component Procedure Description

Fibre channel- Enable Fibre Channel For details, see “SteelFusion and Fibre Channel” compliant storage connections. on page 55. array Core Install Core. For details, see the SteelFusion Core Installation and Configuration Guide. Edge Install the Edge. For details, see the SteelFusion Edge Installation and Configuration Guide. Edge Configure disk management. You can configure the disk layout mode to allow space for the SteelFusion blockstore in the Disk Management page. Free disk space is divided between the Virtual Services Platform (VSP) and the SteelFusion blockstore. For details, see “Configuring disk management on the Edge appliance” on page 77. Configure SteelFusion storage The SteelFusion storage settings are used by the settings. Core to recognize and connect to the Edge. For details, see “Configuring SteelFusion storage” on page 78. Core Run the Setup Wizard to The Setup Wizard performs the initial, minimal perform initial configuration. configuration of the Core, including:

 Network settings

 iSCSI initiator configuration

 Mapping LUNs to the Edges For details, see the SteelFusion Core Installation and Configuration Guide. Core Configure iSCSI initiators and Configure the iSCSI initiator and specify an iSCSI LUNs. portal. This portal discovers all the targets within that portal. Add and configure the discovered targets to the iSCSI initiator configuration. Configure targets. After a target is added, all the LUNs on that target can be discovered, and you can add them to the running configuration. Map LUNs to the Edges. Using the previously defined Edge self-identifier, connect LUNs to the appropriate Edges. For details about the above procedures, see the SteelFusion Core Management Console User’s Guide.

SteelFusion Design Guide 29 Deploying Core and Edge as a System Connecting Core with Edge

Component Procedure Description

Core Configure CHAP users and Optionally, you can configure CHAP users and storage array snapshots. storage array snapshots. For details, see the SteelFusion Core Management Console User’s Guide. Edge Confirm the connection with After completing the Core configuration, confirm Core. that the Edge is connected to and communicating with the Core. For details, see “Mapping LUNs to Edges” on page 30.

Adding Edges to the Core configuration

You can add and modify connectivity with Edges in the Configure > Manage: SteelFusion Edges page in the Core Management Console. This procedure requires you to provide the Edge Identifier for the Edge. Choose this value in the SteelFusion Edge Management Console Storage > Storage Edge page. For more information, see the SteelFusion Core Management Console User’s Guide, the SteelFusion Command-Line Interface Reference Manual, and the Riverbed Command-Line Interface Reference Manual.

Configuring Edge

For information about Edge configuration for deployment, see “Configuring the Edge in iSCSI Mode” on page 73.

Mapping LUNs to Edges

This section describes how to configure LUNs and map them to Edges. It includes the following topics:

 “Configuring iSCSI settings” on page 30

 “Configuring LUNs” on page 31

 “Configuring Edges for specific LUNs” on page 31

Configuring iSCSI settings You can view and configure the iSCSI initiator, portals, and targets in the iSCSI Configuration page. The iSCSI Initiator settings configure how the Core communicates with one or more storage arrays through the specified portal configuration. After configuring the iSCSI portal, you can open the portal configuration to configure targets. For more information and procedures, see the SteelFusion Core Management Console User’s Guide, the SteelFusion Command-Line Interface Reference Manual, and the Riverbed Command-Line Interface Reference Manual.

30 SteelFusion Design Guide Connecting Core with Edge Deploying Core and Edge as a System

Configuring LUNs You configure block disk (Fibre Channel), Edge local, and iSCSI LUNs in the LUNs page. Typically, block disk and iSCSI LUNs are used to store production data. They share the space in the blockstore cache of the associated Edges, and the data is continuously replicated and kept synchronized with the associated LUN in the data center. The Edge blockstore caches only the working set of data blocks for these LUNs; additional data is retrieved from the data center when needed. Block-disk LUN configuration pertains to Fibre Channel support. Fibre Channel is supported only in Core-v deployments. For more information, see “Configuring Fibre Channel LUNs” on page 43. Edge local LUNs are used to store transient and temporary data or local copies of software distribution repositories. Local LUNs also use dedicated space in the blockstore cache of the associated Edges, but the data is not replicated back to the data center LUNs.

Configuring Edges for specific LUNs After you configure the LUNs and Edges for the Core, you can map the LUNs to the Edges. You complete this mapping through the Edge configuration in the Core Management Console Configure > Manage: SteelFusion Edges page. When you select a specific Edge, the following controls for additional configuration are displayed.

Control Description

Status This panel displays the following information about the selected Edge:

 IP Address - The IP address of the selected Edge.

 Connection Status - Connection status to the selected Edge.

 Connection Duration - Duration of the current connection.

 Total LUN Capacity - Total storage capacity of the LUN dedicated to the selected Edge.

 Blockstore Encryption - Type of encryption selected, if any. The panel also displays a small-scale version of the Edge Data I/O report. Target Settings This panel displays the following controls for configuring the target settings:

 Target Name - Displays the system name of the selected Edge.

 Require Secured Initiator Authentication - Requires CHAP authorization when the selected Edge is connecting to initiators. If the Require Secured Initiator Authentication setting is selected, you must set authentication to CHAP in the adjacent Initiator tab.

 Enable Header Digest - Includes the header digest data from the iSCSI protocol data unit (PDU).

 Enable Data Digest - Includes the data digest data from the iSCSI PDU.

 Update Target - Applies any changes you make to the settings in this panel.

SteelFusion Design Guide 31 Deploying Core and Edge as a System Connecting Core with Edge

Control Description

Initiators This panel displays controls for adding and managing initiator configurations:

 Initiator Name - Specify the name of the initiator you are configuring.

 Add to Initiator Group - Select an initiator group from the drop-down list.

 Authentication - Select the authentication method from the drop-down list: None - No authentication required. CHAP - Only the target authenticates the initiator. The secret is set just for the target; all initiators that want to access that target must use the same secret to begin a session with the target. Mutual CHAP - The target and the initiator authenticate each other. A separate secret is set for each target and for each initiator in the storage array. If Require Secured Initiator Authentication setting is selected for the Edge in the Target Settings tab, authentication must be configured for a CHAP option.

 Add Initiator - Adds the new initiator to the running configuration. Initiator Groups This panel displays controls for adding and managing initiator group configurations:

 Group Name - Specifies a name for the group.

 Add Group - Adds the new group. The group name displays in the Initiator Group list. After this initial configuration, click the new group name in the list to display additional controls:

 Click Add or Remove to control the initiators included in the group. Servers (Version 4.6 and later) This panel displays information about ESXi and/or Windows servers connected to the Edge, including alias hostname/IP address, type (VMware or Windows), connection status, backup policy name, and last status (Triggered, Edge Processing, Core Processing, Proxy Mounting, Proxy Mounted, Not Protected, Protect Failed). Note: If you have configured server-level backups, a manual backup for any server that is added to a backup policy must be triggered from this tab (as opposed to the LUNs tab for LUN-level snapshots). LUNs This panel displays controls for mapping available LUNs to the selected Edge. After mapping, the LUN displays in the list in this panel. To manage group and initiator access, click the name of the LUN to access additional controls. Prepopulation This panel displays controls for configuring prepopulation tasks:

 Schedule Name - Specify a task name.

 Start Time - Select the start day and time from the respective drop-down list.

 Stop Time - Select the stop day and time from the respective drop-down list.

 Add Prepopulation Schedule - Adds the task to the Task list. This prepopulation schedule is applied to all virtual LUNs mapped to this appliance if you do not configure any LUN-specific schedules. To delete an existing task, click the trash icon in the Task list. The LUN must be pinned to enable prepopulation. For more information, see “LUN pinning and prepopulation in the Core” on page 25.

32 SteelFusion Design Guide Riverbed Turbo Boot Deploying Core and Edge as a System

Riverbed Turbo Boot Riverbed Turbo Boot is a prefetch technique. Turbo Boot uses the Windows Performance Toolkit to generate information that enables faster boot times for Windows VMs in the branch office on either external ESXi hosts or VSP. Turbo Boot can improve boot times by two to ten times, depending on the customer scenario. Turbo Boot is a plugin that records the disk I/O when booting up the host operating system it has been installed on. The disk I/O activity is logged to a file. During any subsequent boots of the host system, the Turbo Boot log file is used by the Core to perform more accurate prefetch of data. At the end of each boot made by the host, the log file is updated with changes and new information. This update ensures an enhanced prefetch on each successive boot.

Note: Turbo Boot only applies to Windows VMs using NTFS.

If you are booting a Windows server or client VM from an unpinned LUN, we recommend that you install the Riverbed Turbo Boot software on the Windows VM. These operating systems support Riverbed Turbo Boot software:

 Windows Vista

 Windows 7

 Windows Server 2008

 Windows Server 2008 r2

 Windows Server 2012

 Windows Server 2012r2

 Windows Server 2016 (as of Core version 5.0) For installation information, see the SteelFusion Core Installation and Configuration Guide.

Note: The SteelFusion Turbo Boot plugin is not compatible with the branch recovery agent. For more information about the branch recovery agent, see the SteelFusion Core Management Console User’s Guide.

Related information

 SteelFusion Core Installation and Configuration Guide

 SteelFusion Edge Installation and Configuration Guide

 SteelFusion Core Management Console User’s Guide

 SteelFusion Edge Management Console User’s Guide

 Riverbed Splash at https://splash.riverbed.com/community/product-lines/steelfusion

SteelFusion Design Guide 33 Deploying Core and Edge as a System Related information

34 SteelFusion Design Guide 3

Deploying the Core

This chapter describes the deployment processes specific to the Core. It includes the following sections:

 “Core Dashboard overview” on page 35

 “Core deployment process overview” on page 35

 “Interface and port configuration” on page 36

 “Configuring the iSCSI initiator” on page 42

 “Configuring LUNs” on page 42

 “Configuring redundant connectivity with MPIO” on page 44

 “Core pool management” on page 45

 “Cloud storage gateway support” on page 51

 “Related information” on page 52

Note: Features and settings described in this chapter may apply to SteelFusion deployments in both iSCSI/block mode and NFS/file mode. Where indicated, content is applicable to block storage mode only. For deployments that are using SteelFusion in NFS/file mode, also see Chapter 13, “SteelFusion and NFS.”

Core Dashboard overview Although not actually a deployment process, a SteelFusion Core running software version 4.3 or later has a Dashboard that, when the Core is initially deployed, provides panes that direct the administrator to key configuration tasks such as the ones covered in the following sections. Therefore, it is very easy to see at a glance which deployment tasks have not yet been completed. Panes are provided for adding an Edge, adding a failover peer, adding a storage array, adding LUNs, and adding NFS servers and exported fileshares when using NFS/file mode. Once configuration is complete, the Dashboard provides a graphical view of SteelFusion-related activity (cache efficiency, Edge read/write performance), status reports (Core alarms, HA health), and storage (LUN or NFS/file export capacity, backend storage performance). For details on the Core dashboard, see the SteelFusion Core Management Console User’s Guide.

Core deployment process overview Note: Block storage mode only.

SteelFusion Design Guide 35 Deploying the Core Interface and port configuration

Complete the following tasks: 1. Install and connect the Core in the data center network. Include both Cores if you are deploying a high-availability solution. For more information on installation, see the SteelFusion Core Installation and Configuration Guide.

2. Configure the iSCSI initiators in the Core using the iSCSI Qualified Name (IQN) format. Fibre Channel connections to the Core-v are also supported. For more information, see “Configuring Fibre Channel LUNs” on page 43.

3. Enable and provision LUNs on the storage array. Make sure to include registering the Core IQN and configuring any required LUN masks. For details, see “Provisioning LUNs on the storage array” on page 24.

4. Define the Edge identifiers so you can later establish connections between the Core and the corresponding Edges. For details, see “Managing vSphere datastores on LUNs presented by Core” on page 26.

Interface and port configuration Note: Block storage mode only.

This section describes a typical port configuration. You might require additional routing configuration depending on your deployment scenario. This section includes the following topics:

 “Core ports” on page 36

 “Configuring interface routing” on page 38

 “Configuring Core for jumbo frames” on page 41

Core ports

The following table summarizes the ports that connect the Core appliance to your network. Unless noted, the port and descriptions are for all Core models: 2000, 3000, and 3500.

Port Description

Console Connects the serial cable to a terminal device. You establish a serial connection to a terminal emulation program for console access to the Setup Wizard and the Core CLI. Primary Connects Core to a VLAN switch through which you can connect to the Management (PRI) Console and the Core CLI. You typically use this port for communication with Edges. Auxiliary (AUX) Connects the Core to the management VLAN. You can connect a computer directly to the appliance with a crossover cable, enabling you to access the CLI or Management Console.

36 SteelFusion Design Guide Interface and port configuration Deploying the Core

Port Description

eth0_0 to eth0_3 Connects the eth0_0, eth0_1, eth0_2, and eth0_3 ports of Core to a LAN switch using a straight-through cable. You can use the ports either for iSCSI SAN connectivity or Applies to SFCR 2000 failover interfaces when you configure Core for high availability (HA) with another and 3000 Core. In an HA deployment, failover interfaces are usually connected directly between Core peers using crossover cables. If you deploy the Core between two switches, all ports must be connected with straight-through cables. eth1_0 onwards Cores have four gigabit Ethernet ports (eth0_0 to eth0_3) by default. For additional connectivity, you can install optional NICs in PCIe slots within the Core. These slots Applies to SFCR 2000 are numbered 1 to 5. Supported NICs can be either 1 Gb or 10 Gb depending on and 3000 connectivity requirements. The NIC ports are automatically recognized by the Core, following a reboot. The ports are identified by the system as ethX_Y where X corresponds to the PCIe slot number and Y corresponds to the port on the NIC. For example, a two-port NIC in PCIe slot 1 is displayed as having ports eth1_0 and eth1_1. Connect the ports to LAN switches or other devices using the same principles as the other SteelFusion network ports. For more details about installing optional NICs, see the Network and Storage Card Installation Guide. For more information about the configuration of network ports, see the SteelFusion Core Management Console User’s Guide. eth1_0 to eth1_3 Connects the eth1_0, eth1_1, eth1_2, and eth1_3 ports of Core to a LAN switch using a straight-through cable. You can use the ports either for iSCSI SAN connectivity or Applies to SRCR 3500 failover interfaces when you configure Core for high availability (HA) with another Core. In an HA deployment, failover interfaces are usually connected directly between Core peers using crossover cables. If you deploy the Core between two switches, all ports must be connected with straight-through cables. eth2_0 onwards Cores have four gigabit Ethernet ports (eth1_0 to eth1_3) by default. For additional connectivity, you can install optional NICs in PCIe slots within the Core. These slots Applies to SRCR 3500 are numbered 2 to 6. Supported NICs can be either 1 Gb or 10 Gb depending on connectivity requirements. The NIC ports are automatically recognized by the Core, following a reboot. The ports are identified by the system as ethX_Y where X corresponds to the PCIe slot number and Y corresponds to the port on the NIC. For example, a two-port NIC in PCIe slot 2 is displayed as having ports eth2_0 and eth2_1. Connect the ports to LAN switches or other devices using the same principles as the other SteelFusion network ports. For more details about installing optional NICs, see the Network and Storage Card Installation Guide. For more information about the configuration of network ports, see the SteelFusion Core Management Console User’s Guide. Figure 3-1 shows a basic HA deployment indicating some of the SFCR 2000 and 3000 ports and use of straight-through or crossover cables. You can use the same deployment and interface connections for the 3500, but the interface names are different. For more information about HA deployments, see “SteelFusion Appliance High-Availability Deployment” on page 81.

SteelFusion Design Guide 37 Deploying the Core Interface and port configuration

Figure 3-1. Core ports for Core models 2000 and 3000

Configuring interface routing

You configure interface routing by choosing Configure > Networking: Management Interfaces from the Core Management Console.

Note: If all the interfaces have different IP addresses, you do not need additional routes.

This section describes the following scenarios:

 “All interfaces have separate subnet IP addresses” on page 38

 “All interfaces are on the same subnets” on page 39

 “Some interfaces, except primary, share the same subnets” on page 40

 “Some interfaces, including primary, share the same subnets” on page 40

All interfaces have separate subnet IP addresses In this scenario, you do not need additional routes. The following table shows a sample configuration in which each interface has an IP address on a separate subnet.

Interface Sample configuration Description

Auxiliary 192.168.10.2/24 Management (and default) interface. Primary 192.168.20.2/24 Interface to WAN traffic. eth0_0 10.12.5.12/16 Interface for storage array traffic. eth0_1 Optional, additional interface for storage array traffic. eth0_2 192.168.30.2/24 HA failover peer interface, number 1. eth0_3 192.168.40.2/24 HA failover peer interface, number 2.

38 SteelFusion Design Guide Interface and port configuration Deploying the Core

All interfaces are on the same subnets If all interfaces are in the same subnet, only the primary interface has a route added by default. You must configure routing for the additional interfaces. The following table shows a sample configuration.

Interface Sample configuration Description

Auxiliary 192.168.10.1/24 Management (and default) interface. Primary 192.168.10.2/24 Interface to WAN traffic. eth0_0 192.168.10.3/24 Interface for storage array traffic.

To configure additional routes

1. In the Core Management Console, choose Configure > Networking: Management Interfaces.

Figure 3-2. Routing Table on the Management Interfaces page

2. Under Main IPv4 Routing Table, use the following controls to configure routing as necessary.

. Control Description

Add a New Route Displays the controls for adding a new route. Destination IPv4 Address Specify the destination IP address for the out-of-path appliance or network management device. IPv4 Subnet Mask Specify the subnet mask. For example, 255.255.255.0. Gateway IPv4 Address Optionally, specify the IP address for the gateway. Interface From the drop-down list, select the interface. Add Adds the route to the table list.

3. Repeat for each interface that requires routing.

4. Click Save to save your changes permanently. You can also perform this configuration using the ip route CLI command. For details, see the SteelFusion Command-Line Interface Reference Manual.

SteelFusion Design Guide 39 Deploying the Core Interface and port configuration

Some interfaces, except primary, share the same subnets If a subset of interfaces, excluding primary, are in the same subnet, you must configure additional routes for those interfaces. The following table shows a sample configuration.

Interface Sample configuration Description

Auxiliary 10.10.10.1/24 Management (and default) interface. Primary 10.10.20.2/24 Interface to WAN traffic. eth0_0 192.168.10.3/24 Interface for storage array traffic. eth0_1 192.168.10.4/24 Additional interface for storage array traffic.

To configure additional routes

1. In the Core Management Console, choose Configure > Networking: Management Interfaces.

2. Under Main IPv4 Routing Table, use the following controls to configure routing as necessary.

. Control Description

Add a New Route Displays the controls for adding a new route. Destination IPv4 Address Specify the destination IP address for the out-of-path appliance or network management device. IPv4 Subnet Mask Specify the subnet mask. For example, 255.255.255.0. Gateway IPv4 Address Optionally, specify the IP address for the gateway. Interface From the drop-down list, select the interface. Add Adds the route to the table list.

3. Repeat for each interface that requires routing.

4. Click Save to save your changes permanently. You can also perform this configuration using the ip route CLI command. For details, see the SteelFusion Command-Line Interface Reference Manual.

Some interfaces, including primary, share the same subnets If some but not all interfaces, including primary, are in the same subnet, you must configure additional routes for those interfaces. The following table shows a sample configuration.

Interface Sample configuration Description

Aux 10.10.10.2/24 Management (and default) interface. Primary 192.168.10.2/24 Interface to WAN traffic.

40 SteelFusion Design Guide Interface and port configuration Deploying the Core

Interface Sample configuration Description

eth0_0 192.168.10.3/24 Interface for storage array traffic. eth0_1 192.168.10.4/24 Additional interface for storage array traffic. eth0_2 20.20.20.2/24 HA failover peer interface, number 1. eth0_3 30.30.30.2/24 HA failover peer interface, number 2.

To configure additional routes

1. In the Core Management Console, choose Configure > Networking: Management Interfaces.

2. Under Main IPv4 Routing Table, use the following controls to configure routing as necessary.

. Control Description

Add a New Route Displays the controls for adding a new route. Destination IPv4 Address Specify the destination IP address for the out-of-path appliance or network management device. IPv4 Subnet Mask Specify the subnet mask. For example, 255.255.255.0. Gateway IPv4 Address Optionally, specify the IP address for the gateway. Interface From the drop-down list, select the interface. Add Adds the route to the table list.

3. Repeat for each interface that requires routing.

4. Click Save to save your changes permanently. You can also perform this configuration using the ip route CLI command. For details, see the SteelFusion Command-Line Interface Reference Manual.

Configuring Core for jumbo frames

If your network infrastructure supports jumbo frames, configure the connection between the Core and the storage system as described in this section. Depending on how you configure Core, you might configure the primary interface or one or more data interfaces. In addition to configuring Core for jumbo frames, you must configure the storage system and any switches, routers, or other network devices between Core and the storage system.

To configure Core for jumbo frames

1. In the Core Management Console, choose Configure > Networking and open the relevant page (Management Interfaces or Data Interfaces) for the interface used by the Core to connect to the storage network. For example, eth0_0.

2. On the interface on which you want to enable jumbo frames: –Enable the interface.

SteelFusion Design Guide 41 Deploying the Core Configuring the iSCSI initiator

– Select the Specify IPv4 Address Manually option and enter the correct value for your implementation. – Specify 9000 bytes for the MTU setting.

3. Click Apply to apply the settings to the current configuration.

4. Click Save to save your changes permanently. To configure jumbo frames on you storage array, see the documentation from your storage array vendor.

Configuring the iSCSI initiator Note: Block storage mode only.

The iSCSI initiator settings dictate how the Core communicates with one or more storage arrays through the specified portal configuration. iSCSI configuration includes:

 Initiator name

 Enabling header or data digests (optional)

 Enabling CHAP authorization (optional)

 Enabling MPIO and standard routing for MPIO (optional) CHAP functionality and MPIO functionality are described separately in this document. For more information, see “Using CHAP to secure iSCSI connectivity” on page 144 and “Use CHAP” on page 173. In the Core Management Console, you can view and configure the iSCSI initiator, local interfaces for MPIO, portals, and targets by choosing Configure > Storage: iSCSI, Initiators, MPIO. For more information, see the SteelFusion Core Management Console User’s Guide. In the Core CLI, use the following commands to access and manage iSCSI initiator settings:

 storage lun modify auth-initiator to add or remove an authorized iSCSI initiator to or from the LUN

 storage iscsi data-digest to include or exclude the data digest in the iSCSI (PDU)

 storage iscsi header-digest to include or exclude the header digest in the iSCSI PDU

 storage iscsi initiator to access numerous iSCSI configuration settings

Configuring LUNs Note: Block storage mode only.

This section includes the following topics:

 “Exposing LUNs” on page 43

 “Resizing LUNs” on page 43

 “Configuring Fibre Channel LUNs” on page 43

 “Removing a LUN from a Core configuration” on page 44

42 SteelFusion Design Guide Configuring LUNs Deploying the Core

Before you can configure LUNs in Core, you must provision the LUNs on the storage array and configure the iSCSI initiator. For more information, see “Provisioning LUNs on the storage array” on page 24 and “Configuring the iSCSI initiator” on page 42.

Exposing LUNs

You expose LUNs by scanning for LUNs on the storage array, and then mapping them to the Edges. After exposing LUNs, you can further configure them for failover, MPIO, snapshots, and pinning and prepopulation. In the Core Management Console, you can expose and configure LUNs by choosing Configure > Manage: LUNs. For more information, see the SteelFusion Core Management Console User’s Guide. In the Core CLI, you can expose and configure LUNs with the following commands:

 storage iscsi portal host rescan-luns to discover available LUNs on the storage array

 storage lun add to add a specific LUN

 storage lun modify to modify an existing LUN configuration For more information, see the SteelFusion Command-Line Interface Reference Manual.

Resizing LUNs

Granite 2.6 introduced the LUN expansion feature. Prior to Granite 2.6, to resize a LUN you needed to unmap the LUN from an Edge, remove the LUN from Core, change the size on the storage array, add it back to Core, and map it to the Edge. The LUN expansion feature generally automatically detects LUN size increases made on a data center storage array if there are active read and write operations, and then propagates the change to the Edge. However, if there are no active read and write operations, you must perform a LUN rescan on the Core Configure > Manage: LUNs page for the Core to detect the new LUN size. If the LUN is pinned, you need to make sure the blockstore on its Edge can accommodate the new size of the LUN.

Note: If you have configured SteelFusion Replication, the new LUN size on the primary Core is updated only when the replica LUN size is the same or greater.

Configuring Fibre Channel LUNs

The process of configuring Fibre Channel LUNs for Core requires configuration in both the ESXi server and the Core. For more information, see “SteelFusion and Fibre Channel” on page 55 and the Fibre Channel on SteelFusion Core Virtual Edition Solution Guide.

SteelFusion Design Guide 43 Deploying the Core Configuring redundant connectivity with MPIO

Removing a LUN from a Core configuration

This section describes the process to remove a LUN from a Core configuration. This process requires actions on both the Core and the server running at the branch.

Note: In the following example procedure, the branch server is assumed to be a Windows server; however, similar steps are required for other types of servers.

To remove a LUN

1. At the branch where the LUN is exposed:

 Power down the local Windows server.

 If the Windows server runs on ESXi, you must also unmount and detach the LUN from ESXi.

2. At the data center, take the LUN offline in the Core configuration. When you take a LUN offline, outstanding data is flushed to the storage array LUN and the blockstore cache is cleared. The offline procedure can take a few minutes. Depending on the WAN bandwidth, latency, utilization, and the amount of data in the Edge blockstore that has not yet been synchronized back to the data center, this operation can take seconds to many minutes or even hours. Use the reports on the Edge to help understand just how much data is left to be written back. Until all the data is safely synchronized back to the LUN in the data center, the Core keeps the LUN in an offlining state. Only when the data is safe does the LUN status change to offline. To take a LUN offline, use one of the following methods:

 CLI - Use the storage lun modify offline command.

 Management Console - Choose Configure > Manage: LUNs to open the LUNs page, select the LUN configuration in the list, and select the Details tab.

3. Remove the LUN configuration using one of the following methods:

 CLI - Use the storage lun remove command.

 Management Console - Choose Configure > Manage: LUNs to open the LUNs page, locate the LUN configuration in the list, and click the trash icon. For details about CLI commands, see the SteelFusion Command-Line Interface Reference Manual. For details about using the Core Management Console, see the SteelFusion Core Management Console User’s Guide.

Configuring redundant connectivity with MPIO The MPIO feature enables you to configure multiple physical I/O paths (interfaces) for redundant connectivity with the local network, storage system, and iSCSI initiator. Both Core and Edge offer MPIO functionality. However, these features are independent of each other and do not affect each other.

44 SteelFusion Design Guide Core pool management Deploying the Core

MPIO in Core

Note: Block storage mode only.

The MPIO feature enables you to connect Core to the network and to the storage system through multiple physical I/O paths. Redundant connections help prevent loss of connectivity in the event of an interface, switch, cable, or other physical failure. You can configure MPIO at the following separate and independent points:

 iSCSI initiator - This configuration allows you to enable and configure multiple I/O paths between the Core and the storage system. Optionally, you can enable standard routing if the iSCSI portal is not in the same subnet as the MPIO interfaces.

 iSCSI target - This configuration allows you to configure multiple portals on the Edge. Using these portals, an initiator can establish multiple I/O paths to the Edge.

Configuring Core MPIO interfaces

You can configure MPIO interfaces through the Core Management Console or the Core CLI. In the Core Management Console, choose Configure > Storage Array: iSCSI, Initiator, MPIO. Configure MPIO using the following controls:

 Enable MPIO.

 Enable standard routing for MPIO. This control is required if the backend iSCSI portal is not in the same subnet of at least two of the MPIO interfaces.

 Add (or remove) local interfaces for the MPIO connections. For details about configuring MPIO interfaces in the Core Management Console, see the SteelFusion Core Management Console User’s Guide. In the Core CLI, open the configuration terminal mode and run the following commands:

 storage iscsi session mpio enable to enable the MPIO feature.

 storage iscsi session mpio standard-routes enable to enable standard routing for MPIO. This command is required if the backend iSCSI portal is not in the same subnet of at least two of the MPIO interfaces.

 storage lun modify mpio path to specify a path. These commands require additional parameters to identify the LUN. For details about configuring MPIO interfaces in the Core CLI, see the SteelFusion Command-Line Interface Reference Manual.

Core pool management Note: Block storage mode only.

This section describes Core pool management. It includes the following topics:

 “Overview of Core pool management” on page 46

 “Pool management architecture” on page 46

 “Configuring pool management” on page 47

SteelFusion Design Guide 45 Deploying the Core Core pool management

 “Changing pool management structure” on page 50

 “High availability in pool management” on page 51

Overview of Core pool management

Core Pool Management simplifies the administration of large installations in which you need to deploy several Cores. Pool management enables you to manage storage configuration and check storage- related reports on all the Cores from a single Management Console. Pool management is especially relevant to Core-v deployments when LUNs are provided over Fibre Channel. VMware ESX has a limitation for raw device mapping (RDM) LUNs, which limits Core-v to 60 LUNs. In releases prior to SteelFusion 3.0, to manage 300 LUNs, you needed to deploy five separate Core-vs. To ease the Core management in SteelFusion 3.0 and later, you can combine Cores into management pools. In SteelFusion 3.0 and later, you can enable access to the SteelHead REST API framework. This access enables you to generate a REST API access code for use in SteelFusion Core pool management. You can access the REST API by choosing Configure > Pool Management: REST API Access. For more information about pool management, see SteelFusion Core Management Console User’s Guide.

Pool management architecture

Pool management is a two-tier architecture that allows each Core to become either manager or a member of a pool. A Core can be part of only one pool. The pool is a single-level hierarchy with a flat structure, in which all members of the pool except the manager have equal priority and cannot themselves be managers of pools. The pool has a loose membership, in which pool members are not aware of one another, except for the manager. Any Core can be the manager of the pool, but the pool manager cannot be a member of any other pool. You can have up to 32 Cores in one pool, not including the manager. The pool is dissolved when the manager is no longer available (unless the manager has an HA peer). Management of a pool can be taken over by a failover peer. However, a member failover peer cannot be managed by the member pool manager through the member, even if the failover peer is down. For details about HA, see “High availability in pool management” on page 51.

46 SteelFusion Design Guide Core pool management Deploying the Core

From a performance prospective, it does not matter which Core you choose as the manager. The resources required by the pool manager have minor to no differences from regular Core operations.

Figure 3-3. Core two-tier pool management

Configuring pool management

This section describes how to configure pool management. These are the high-level steps: 1. “To create a pool” on page 47

2. “To generate a REST access code for a member” on page 48

3. “To add a member to the pool” on page 49 You can configure pool management only through the Management Console.

To create a pool

1. Decide which Core you want to become the pool manager.

2. In the Management Console of the pool manager, choose Configure > Pool Management: Edit Pool.

3. Specify a name for the pool in the Pool Name field.

4. Click Create Pool.

SteelFusion Design Guide 47 Deploying the Core Core pool management

To generate a REST access code for a member

1. In the Management Console of the pool member, choose Configure > Pool Management: REST API Access.

Figure 3-4. REST API access page

2. Select Enable REST API Access and click Apply.

3. Select Add Access Code.

4. Specify a useful description, such as For Pool Management from , in the Description of Use field.

5. Select Generate New Access Code and click Add. A new code is generated.

6. Expand the new entry and copy the access code.

48 SteelFusion Design Guide Core pool management Deploying the Core

Continue to “To add a member to the pool” on page 49 to finish the process.

Figure 3-5. REST API Access code

Note: You can revoke access of a pool manager by removing the access code or disabling REST API access on the member.

Before you begin the next procedure, you need the hostnames or the IP addresses of the Cores you want to add as members.

To add a member to the pool

1. In the Management Console of the pool manager, choose Configure > Pool Management: Edit Pool.

2. Select Add a Pool Member.

3. Add the member by specifying the hostname or the IP address of the member.

4. Paste the REST API access code that you generated in the API Access Code field on the Management Console of the pool member.

SteelFusion Design Guide 49 Deploying the Core Core pool management

When a member is successfully added to the pool, the pool manager Pool Management page displays statistics about the members, such as health, number of LUNs, model, failover status, and so on.

Figure 3-6. Successful pool management configuration

Changing pool management structure

A pool manager can remove individual pool members or dissolve the whole pool. A pool member can release itself from the pool.

To remove a pool relationship for a single member or to dissolve the pool completely

1. In the Management Console of the pool manager, choose Configure > Pool Management: Edit Pool.

2. To remove an individual pool member, click the trash icon in the Remove column of the desired member you want to remove. To dissolve the entire pool, click Dissolve Pool. We recommend that you release a member from a pool from the Management Console of the manager. Use the following procedure to release a member from the pool only if the manager is either gone or cannot contact the member.

To release a member from a pool

1. In the Management Console of the pool member, choose Configure > Pool Management: Edit Pool. You see the message, “This appliance is currently a part of pool and is being managed by .”

2. Click Release me from the Pool.

50 SteelFusion Design Guide Cloud storage gateway support Deploying the Core

This action releases the member from the pool, but you continue to see the member in the pool table on the manager.

Figure 3-7. Releasing a pool member from the member Management Console

3. Manually delete the released member from the manager pool table.

High availability in pool management

When you use pool management in conjunction with an HA environment, configure both peers as members of the same pool. If you choose one of the peers to be a pool manager, its failover peer should join the pool as a member. Without pool management, Core cannot manage its failover peer storage configuration unless failover is active (the failover peer is down). With pool management, the manager can manage the failover peer storage configuration even while the failover peer is up. The manager failover peer can manage the manager storage configuration only when the manager is down. The following scenarios show how you can use HA in pool management:

 The manager is down and its failover peer is active. In this scenario, when the manager is down the failover peer can take over the management of a pool. The manager failover peer can manage storage configuration for the members of the pool using the same configuration as the manager.

 The member is down and its failover peer is active. When a member of a pool is down and it has a failover peer configured (and the peer is not the manager of the member), the failover peer takes over servicing the LUNs of the member. The failover peer can access the storage configuration of the member when it is down. However, the pool manager cannot access the storage configuration of the failed member. To manage storage configuration of the down member, you need to log in to the Management Console of its failover peer directly.

Note: The pool is dissolved when the manager is no longer available, unless the manager has an HA peer.

For more details about HA deployments, see “SteelFusion Appliance High-Availability Deployment” on page 81.

Cloud storage gateway support Note: Block storage mode only.

SteelFusion Design Guide 51 Deploying the Core Related information

Cloud storage gateway technology enables organizations to store data in the public cloud and access it using standard storage protocols, like iSCSI, via an appliance on the customer premises. In simple terms, a cloud storage gateway device provides on-premises access for local initiators using iSCSI and connects through to storage hosted in the public cloud using a RESTful API via HTTPS. This approach enables companies to adopt a tiered storage methodology for their data by retaining a “working set” within the data center at the same time as moving older data out to cloud hosting providers. The individual features and specifications of storage gateway products may vary according to the manufacturer. Such details go beyond the scope of this document. With Core 4.3 and later, SteelFusion Core has the ability to support cloud storage gateway technology from Amazon and Microsoft. Depending on the cloud gateway product configured with SteelFusion Core, the cloud storage is either Amazon Web Services S3 (Simple Storage Service) or Microsoft Azure Blob Storage. The Amazon storage gateway product is called AWS Cloud gateway and the Microsoft storage gateway is called StorSimple. These cloud gateway products provide an iSCSI target to SteelFusion Core in exactly the same way as regular “on premises” iSCSI storage arrays would. SteelFusion can now extend the benefits of public cloud storage all the way out to the branch office while continuing to provide the benefits of SteelFusion. The following SteelFusion Core features are supported with a cloud storage gateway deployment:

 Core and Edge in HA deployment

 Protection against data center failures

 Instant branch provisioning and recovery

 Data security

 Data encrypted at rest on the SteelFusion appliance

 Data encrypted in flight from branch to data center

 Amazon and Microsoft security best practices to encrypt data from data center to cloud The following SteelFusion Core features are not currently supported:

 Snapshot or data protection of SteelFusion LUNs provided by the cloud gateways

Note: You can always use the respective cloud vendor’s built-in data protection tools to provide the snapshot capability.

Because SteelFusion sees no real difference between on-premises storage arrays and cloud storage gateway, the configuration of SteelFusion Core remains the same with respect to LUN provisioning, access, and use of these LUNs by SteelFusion Edge.

Related information

 SteelFusion Core Management Console User’s Guide

 SteelFusion Edge Management Console User’s Guide

 SteelFusion Core Installation and Configuration Guide

 SteelFusion Command-Line Interface Reference Manual

 Network and Storage Card Installation Guide

52 SteelFusion Design Guide Related information Deploying the Core

 Fibre Channel on SteelFusion Core Virtual Edition Solution Guide

 Riverbed Splash at https://splash.riverbed.com/community/product-lines/steelfusion

SteelFusion Design Guide 53 Deploying the Core Related information

54 SteelFusion Design Guide 4

SteelFusion and Fibre Channel

This chapter includes general information about Fibre Channel LUNs and how they interact with SteelFusion. It includes the following sections:

 “Overview of Fibre Channel” on page 55

 “Deploying Fibre Channel LUNs on Core-v appliances” on page 60

 “Configuring Fibre Channel LUNs in a Core-v HA scenario” on page 64

 “Populating Fibre Channel LUNs” on page 66

 “Best practices for deploying Fibre Channel on Core-v” on page 68

 “Troubleshooting” on page 69

 “Related information” on page 70

Overview of Fibre Channel Core-v can connect to Fibre Channel LUNs at the data center and export them to the branch office as iSCSI LUNs. The iSCSI LUNs can then be mounted by VMware ESX or ESXi hypervisor running internally on VSP or on external ESX or ESXi servers or directly by Microsoft Windows virtual servers through Microsoft iSCSI Initiator. A virtual Windows file server running on VSP (Figure 4-1) can then share the mounted drive to branch office client PCs through CIFS protocol. This section includes the following topics:

 “Fibre Channel LUN considerations” on page 57

 “How VMware ESXi virtualizes Fibre Channel LUNs” on page 57

 “How Core-v connects to RDM Fibre Channel LUNs” on page 58

 “Requirements for Core-v and Fibre Channel SANs” on page 59

 “Specifics about Fibre Channel LUNs versus iSCSI LUNs” on page 60

SteelFusion Design Guide 55 SteelFusion and Fibre Channel Overview of Fibre Channel

Figure 4-1. SteelFusion solution with Fibre Channel

Fibre Channel is the predominant storage networking technology for enterprise business. Fibre Channel connectivity is estimated to be at 78 percent versus iSCSI at 22 percent. IT administrators still rely on the known, trusted, and robust Fibre Channel technology. Fibre Channel is a set of integrated standards developed to provide a mechanism for transporting data at the fastest rate possible with the least delay. In storage networking, Fibre Channel is used to interconnect host and application servers with storage systems. Typically, servers and storage systems communicate using the SCSI protocol. In a storage area network (SAN), the SCSI protocol is encapsulated and transported through Fibre Channel frames. The Fibre Channel (FC) protocol processing on the host servers and the storage systems is mostly carried out in hardware. Figure 4-2 shows the various layers in the FC protocol stack and the portions implemented in hardware and software for an FC host bus adapter (HBA). FC HBA vendors are Qlogic, Emulex, and LSI.

Figure 4-2. HBA FC protocol stack

56 SteelFusion Design Guide Overview of Fibre Channel SteelFusion and Fibre Channel

Special switches are also required to transport Fibre Channel traffic. Vendors in this market are Cisco and Brocade. Switches implement many of the FC protocol services, such as name server, domain server, zoning, and so on. Zoning is particularly important because, in collaboration with LUN masking on the storage systems, it implements storage access control by limiting access to LUNs to specific initiators and servers through specific targets and LUNs. An initiator and a target are visible to each other only if they belong to the same zone. LUN masking is an access control mechanism implemented on the storage systems. NetApp implements LUN masking through initiator groups, which enable you to define a list of worldwide names (WWNs) that are allowed to access a specific LUN. EMC implements LUN masking using masking views that contain storage groups, initiator groups, and port groups. LUN masking is important because Windows-based servers, for example, attempt to write volume labels to all available LUNs. This attempt can make the LUNs unusable by other operating systems and can result in data loss.

Fibre Channel LUN considerations

Fibre Channel LUNs are distinct from iSCSI LUNs in several important ways:

 No MPIO configuration - Multipathing support is performed by the ESXi system.

 SCSI reservations - SCSI reservations are not taken on Fibre Channel LUNs.

 Additional HA configuration required - Configuring HA for Core-v failover peers requires that each appliance be deployed on a separate ESXi system.

 Maximum of 60 Fibre Channel LUNs per ESXi system - ESXi allows a maximum of 60 RDMs into a VM. Within a VM an RDM is represented by a virtual SCSI device. A VM can only have four virtual SCSI controllers with 15 virtual SCSI devices each.

How VMware ESXi virtualizes Fibre Channel LUNs

The VMware ESXi hypervisor provides not only CPU and memory virtualization but also host-level storage virtualization, which logically abstracts the physical storage layer from virtual machines. Virtual machines do not access the physical storage or LUNs directly, but instead use virtual disks. To access virtual disks, a virtual machine uses virtual SCSI controllers. Each virtual disk that a virtual machine can access through one of the virtual SCSI controllers resides on a VMware Virtual Machine File System (VMFS) datastore or a raw disk. From the standpoint of the virtual machine, each virtual disk appears as if it were a SCSI drive connected to a SCSI controller. Whether the actual physical disk device is being accessed through parallel SCSI, iSCSI, network, or Fibre Channel adapters on the host is transparent to the guest operating system.

Virtual machine file system In a simple configuration, the disks of virtual machines are stored as files on a Virtual Machine File System (VMFS). When guest operating systems issue SCSI commands to their virtual disks, the virtualization layer translates these commands to VMFS file operations.

SteelFusion Design Guide 57 SteelFusion and Fibre Channel Overview of Fibre Channel

Raw device mapping A raw device mapping (RDM) is a special file in a VMFS volume that acts as a proxy for a raw device, such as a Fibre Channel LUN. With the RDM, an entire Fibre Channel LUN can be directly allocated to a virtual machine.

Figure 4-3. ESXi storage virtualization

How Core-v connects to RDM Fibre Channel LUNs

Core-v uses RDM to mount Fibre Channel LUNs and export them to the Edge at the branch office. The Edge exposes those LUNs as iSCSI LUNs to the branch office clients.

Figure 4-4. Core-VM FC LUN to RDM Mapping

58 SteelFusion Design Guide Overview of Fibre Channel SteelFusion and Fibre Channel

When Core-v interacts with an RDM Fibre Channel LUN, the following process takes place: 1. Core-v issues SCSI commands to the RDM disk.

2. The in the Core-v operating system communicates with the virtual SCSI controller.

3. The virtual SCSI controller forwards the command to the ESXi virtualization layer or VMkernel.

4. The VMkernel performs the following tasks:

 Locates the RDM file in the VMFS.

 Maps the SCSI requests for the blocks on the RDM virtual disk to blocks on the appropriate Fibre Channel LUN.

 Sends the modified I/O request from the device driver in the VMkernel to the HBA.

5. The HBA performs the following tasks:

 Packages the I/O request according to the rules of the FC protocol.

 Transmits the request to the storage system.

 A Fibre Channel switch receives the request and forwards it to the storage system that the host wants to access.

Requirements for Core-v and Fibre Channel SANs

The following table describes the hardware and software requirements for deploying Core-v with Fibre Channel SANs.

Requirements Notes

SteelFusion Edge 4.0 and later Core-v with SteelFusion 2.5 or later

VMware ESX/ESXi 4.1 or later Storage system, HBA, and firmware combination For details, see the VMware Compatibility supported in conjunction with ESX/ESXi systems Guide.

Reserve CPU(s) and RAM on the ESX/ESXi system Core model V1000U: 2 GB RAM, 2 CPU Core model V1000L: 4 GB RAM, 4 CPU Core model V1000H: 8 GB RAM, 8 CPU Core model V1500L: 32 GB RAM, 8 CPU Core model V1500H: 48 GB RAM, 12 CPU Fibre Channel license on the storage system In some storage systems, Fibre Channel is a licensed feature.

SteelFusion Design Guide 59 SteelFusion and Fibre Channel Deploying Fibre Channel LUNs on Core-v appliances

Specifics about Fibre Channel LUNs versus iSCSI LUNs

Using Fibre Channel LUNs on Core-v in conjunction with VMware ESX/ESXi differs from using iSCSI LUN directly on the Core in a number of ways, as listed in the following table.

Feature Fibre Channel LUNs Versus iSCSI LUNs

Multipathing The ESX/ESXi system (not the Core) performs multipathing for the Fibre Channel LUNs.

VSS snapshots Snapshots created using the Microsoft Windows diskshadow command are not supported on Fibre Channel LUNs. SCSI reservations SCSI reservations are not taken on Fibre Channel LUNs. Core HA deployment Active and failover Core-vs must be deployed in a separate ESX/ESXi system. Max 60 Fibre Channel ESX/ESXi systems enable a maximum of four SCSI controllers. Each controller LUNs per ESX/ESXi supports a maximum of 15 SCSI devices. Hence, a maximum of 60 Fibre Channel system LUNs are supported per ESX/ESXi system. VMware vMotion not Core-vs cannot be moved to a different ESXi server using VMware vMotion. supported

VMware HA not A Core-v cannot be moved to another ESXi server through the VMware HA supported mechanism. To ensure that the Core-v stays on the specific ESXi server, create an affinity rule as described in this Riverbed Knowledge Base article: http:// kb.vmware.com/selfservice/microsites/ search.do?language=en_US&cmd=displayKC&externalId=1005508

Deploying Fibre Channel LUNs on Core-v appliances This section describes the process and procedures for deploying Fibre Channel LUNs on Core-v appliances. It includes the following sections:

 “Deployment prerequisites” on page 60

 “Configuring Fibre Channel LUNs” on page 61

Deployment prerequisites

Before you can deploy Fibre Channel LUNs on Core-v appliances, the following conditions must be met:

 The active Core-v must be deployed and powered up on the ESX/ESXi system.

 The failover Core-v must be deployed and powered up on the second ESX/ESXi system.

 A Fibre Channel LUN must be available on the storage system.

 Preconfigured initiator and storage groups for LUN mapping to the ESX/ESXi systems must be available.

 Preconfigured zoning on the Fibre Channel switch for LUN visibility to the ESX/ESXi systems across the SAN fabric must be available.

 You must have administrator access to the storage system, the ESX/ESXi system, and SteelFusion appliances.

60 SteelFusion Design Guide Deploying Fibre Channel LUNs on Core-v appliances SteelFusion and Fibre Channel

For more information about how to set up Fibre Channel LUNs with the ESX/ESXi system, see the relevant edition of the VMware Fibre Channel SAN Configuration Guide and the VMware vSphere ESXi vCenter Server Storage Guide.

Configuring Fibre Channel LUNs

Perform the procedures in the following sections to configure the Fibre Channel LUNs: 1. “To discover and configure Fibre Channel LUNs as Core RDM disks on an ESX/ESXi system” on page 61

2. “To discover and configure exposed Fibre Channel LUNs though an ESX/ESXi system on the Core-v” on page 62

To discover and configure Fibre Channel LUNs as Core RDM disks on an ESX/ESXi system

1. Navigate to the ESX system Configuration tab, click Storage Adapters, select the FC HBA, and click Rescan All to discover the Fibre Channel LUNs.

Figure 4-5. FC Disk Discovery

2. Right-click the name of the Core-v and select Edit Settings. The virtual machine properties dialog box opens.

3. Click Add and select Hard Disk for device type.

SteelFusion Design Guide 61 SteelFusion and Fibre Channel Deploying Fibre Channel LUNs on Core-v appliances

4. Click Next and select Raw Device Mappings for type of disk to use.

Figure 4-6. Select Raw Device Mappings

5. Select the LUNs to expose to the Core-v. If you do not see the LUN, follow the steps described in “Troubleshooting” on page 69.

6. Select the datastore on which you want store the LUN mapping.

7. Select Store with Virtual Machine.

Figure 4-7. Store mappings with VM

8. For compatibility mode, select Physical.

9. For advanced options, use the default virtual device node setting.

10. Review the final options and click Finish. The Fibre Channel LUN is now set up as an RDM and ready to be used by the Core-v.

To discover and configure exposed Fibre Channel LUNs though an ESX/ESXi system on the Core-v

1. In the Core Management Console, choose Configure > Manage: LUNs and select Add a LUN.

2. Select Block Disk.

62 SteelFusion Design Guide Deploying Fibre Channel LUNs on Core-v appliances SteelFusion and Fibre Channel

3. From the drop-down menu, select Rescan for new LUNs to discover the newly added RDM LUNs (Figure 4-8).

Figure 4-8. Rescan for new block disks

4. Select the LUN Serial Number.

5. Select Add Block Disk LUN to add it to the Core-v. Map the LUN to the desired Edge and configure the access lists of the initiators.

Figure 4-9. Add new block disk

SteelFusion Design Guide 63 SteelFusion and Fibre Channel Configuring Fibre Channel LUNs in a Core-v HA scenario

Configuring Fibre Channel LUNs in a Core-v HA scenario This section describes how to deploy Core-vs in HA environments. It includes the following topics:

 “When ESXi servers hosting the Core-v appliances are managed by vCenter” on page 64

 “When ESXi servers hosting the Core-v appliances are not managed by vCenter” on page 66 When you deploy Core-v appliances in an HA environment, install the two appliances on separate ESX servers so that there is no single point of failure. You can deploy the Core-v appliances differently depending on whether the ESX servers hosting the Core-v appliances are managed by a vCenter or not. The methods described in this section are only relevant when Core-v appliances manage FC LUNs (also called block disk LUNs). For both deployment methods, modify the storage system Storage Group to expose the LUN to both ESXi systems. Figure 4-10 shows that LUN 0 is assigned to both worldwide names of the HBAs or the ESXi HBAs.

Figure 4-10. Core-v HA deployment

When ESXi servers hosting the Core-v appliances are managed by vCenter

This is a scenario where two Core-v appliances (Core-v1 and Core-v2) are deployed in HA. They are hosted on ESX and managed by vCenter. After adding a LUN in as RDM to Core-v1, vCenter does not present the LUN in the list of LUNs available to add as RDM to Core-v2. It is not available because the LUN filtering mechanism is turned on in vCenter by default to help prevent LUN corruption.

64 SteelFusion Design Guide Configuring Fibre Channel LUNs in a Core-v HA scenario SteelFusion and Fibre Channel

One way to solve the problem is by adding LUNs to the two Core-v appliances in HA, with the ESX servers in a vCenter without turning off LUN filtering by using the following procedures. You must also have a shared datastore on a SAN that the ESXi hosts can access that can be used to store the RDM files.

Add LUNs to the first Core-v

1. In the vSphere Client inventory, select the first Core-v and select Edit Settings. The Virtual Machine Properties dialog box opens.

2. Click Add, select Hard Disk, and click Next.

3. Select Raw Device Mappings and click Next.

4. Select the LUN to be added and click Next.

5. Select a datastore and click Next. This datastore must be on a SAN because you need a single shared RDM file for each shared LUN on the SAN.

6. Select Physical as the compatibility mode and click Next. A SCSI controller is created when the virtual hard disk is created.

7. Select a new virtual device node. For example, select SCSI (1:0), and click Next. This node must be a new SCSI controller. You cannot use SCSI 0.

8. Click Finish to complete creating the disk.

9. In the Virtual Machine Properties dialog box, select the new SCSI controller and set SCSI Bus Sharing to Physical and click OK.

To Add LUNs to the second Core-v

1. In the vSphere Client inventory, select the HA Core-v and select Edit Settings. The Virtual Machine Properties dialog box appears.

2. Click Add, select Hard Disk, and click Next.

3. Select Use an existing virtual disk and click Next.

4. In Disk File Path, browse to the location of the disk specified for the first node. Select Physical as the compatibility mode and click Next. A SCSI controller is created when the virtual hard disk is created.

5. Select the same virtual device node you chose for the first Core-v's LUN (for example, SCSI [1:0]), and click Next. The location of the virtual device node for this LUN must match the corresponding virtual device node for the first Core-v.

SteelFusion Design Guide 65 SteelFusion and Fibre Channel Populating Fibre Channel LUNs

6. Click Finish.

7. In the Virtual Machine Properties dialog box, select the new SCSI controller, set SCSI Bus Sharing to Physical, and click OK. Keep in mind the following caveats:

 You cannot use SCSI controller 0; so the number of RDM LUNs supported on Core-v running on ESXi 5.x reduces from 60 to 48.

 You can only change the SCSI controller SCSI bus sharing setting when the Core-v is powered down; so you need to power down the Core-v each time you want to add a new controller. Each controller supports 16 disks.

 vMotion is not supported with Core-v. Another solution is to turn off LUN filtering (RDM filtering) on the vCenter. If you are not willing to turn off RDM filtering for the entire vCenter, you cannot disable LUN filtering per data center or per LUN in vCenter. If you turn off LUN filtering temporarily, you can complete the following steps: 1. Turn off RDM filtering on vCenter. The LUN filtering mechanism during RDM creation adds LUNs to both Core-vs.

2. Turn RDM filtering back on. You must repeat these steps every time new LUNs are added to the Core-v appliances. However, VMware does not recommend turning LUN filtering off unless there are other methods to prevent LUN corruption. This method should be used with caution.

When ESXi servers hosting the Core-v appliances are not managed by vCenter

When ESX is hosting the Core-v appliances in HA that are not managed by the same vCenter or not managed by vCenter at all, you can add the LUNs as RDM to both Core-vs without any issues or special configuration requirements.

Populating Fibre Channel LUNs This section provides the basic steps you need to populate Fibre Channel LUNs prior to deploying them into the Core.

To populate a Fibre Channel LUN

1. Create a LUN (Volume) in the storage array and allow the ESXi host where the Core is installed to access it.

2. Go to the ESXi host and choose Configuration > Advance Settings > RdmFilter and clear RdmFilter to disable it. You must complete this step if you intend on deploying Core in HA config.

66 SteelFusion Design Guide Populating Fibre Channel LUNs SteelFusion and Fibre Channel

3. Navigate to the ESX system Configuration tab, click Storage Adapters, select the FC HBA, and click Rescan All… to discover the Fibre Channel LUNs (Figure 4-5 on page 61).

4. On the ESXi server, select Storage and click Add.

5. Select Disk/LUN for the storage type and click Next. You might need to wait a few moments before the new Fibre Channel LUN appears in the list.

6. Select the Fibre Channel drive and click Next.

7. Select VMFS-5 for the file system version and click Next.

8. Click Next, enter a name for the datastore, and click Next.

9. For Capacity, use the default setting of Maximum available space and click Next.

10. Click Finish.

11. Copy files from an existing datastore to the datastore you just added.

12. Select the new datastore and unmount. You need to unmount and detach the device and rescan it, and then reattach it before you can proceed.

To unmount and detach a datastore

1. Right-click the device in the Devices list and choose Unmount.

2. Right-click the device in the Devices list and choose Detach.

3. Rescan the data twice.

4. Reattach the device by right-clicking the device in the Devices list and choosing Attach. Do not rescan the device.

To add the LUN to the Core-v

1. Right-click the Core-v and select Edit Settings.

2. Click Add and select Hard Disk.

3. Click Next, and when prompted to select a disk to use, select Raw Device Mappings.

4. Select the target LUN to use.

5. Select the datastore on which to store the LUN mapping and select Store with Virtual Machine.

6. Select Physical for compatibility mode.

7. For advanced options, use the default setting.

SteelFusion Design Guide 67 SteelFusion and Fibre Channel Best practices for deploying Fibre Channel on Core-v

8. Review the final options and click Finish. The Fibre Channel LUN is now set up as RDM and ready to be used by the Core-v. When the LUN is projected in the branch site and is attached to the Branch ESXi Server (VSP or other device), you are prompted to select VMFS mount options. Select Keep the existing signature.

Best practices for deploying Fibre Channel on Core-v This section describes the best practices for deploying Fibre Channel on Core-v. Follow these suggestions because they lead to designs that are easier to configure and troubleshoot. This section includes the following topics:

 “Best practices” on page 68

 “Recommendations” on page 69

Best practices

The following table shows the Riverbed best practices for deploying Fibre Channel on Core-v.

Best Practice Description

Keep iSCSI and Fibre Channel Do not mix iSCSI and Fibre Channel LUNs in the same LUNs on separate Cores Core-v. Use ESX/ESXi 4.1 or later Make sure that the ESX/ESXi system is running version 4.1 or later. Use gigabit links Make sure that you map the Core-v interfaces to gigabit links and are not shared with other traffic. Dedicate physical NICs Use one-to-one mapping between physical and virtual NICs for the Core data interfaces. Reserve CPU(s) and RAM Reserve CPU(s) and RAM for the Virtual Core appliance, following the guidelines listed in the following table.

The following table shows the CPU and RAM guidelines for deployment.

Model Memory Disk space Recommended CPU Maximum data Maximum reservation reservation set size number of branches

VGC-1000-U 2 GB 25 GB 2 @ 2.2 GHz 2 TB 5 VGC-1000-L 4 GB 25 GB 4 @ 2.2 GHz 5 TB 10 VGC-1000-M 8 GB 25 GB 8 @ 2.2 GHz 10 TB 20 VGC-1500-L 32 GB 350 GB 8 @ 2.2 GHz 20 TB 30 VGC-1500-M 48 GB 350 GB 12 @ 2.2 GHz 35 TB 30

68 SteelFusion Design Guide Troubleshooting SteelFusion and Fibre Channel

Recommendations

The following table shows the Riverbed recommendations for deploying Fibre Channel on the Core-v.

Recommendation Description

Deploy a dual-redundant FC The FC HBA connects the ESXi system to the SAN. Dual-redundant HBA HBAs help to keep an active path always available. ESXi multipath software is used for controlling and monitoring HBA failure. In case of path or HBA failure, the workload is failed-over to the working path. Use recommended practices Before deleting, offlining, or unmapping the LUNs from the storage for removing/deleting FC LUNs system or removing the LUNs from the zoning configuration, remove the LUNs/block disks from the Core and unmount the LUNs from the ESXi system. ESXi might become unresponsive and sometimes might need to be rebooted if all paths to a LUN are lost. Do not use the block disks on Fibre Channel LUNs (also known as block disks) are not supported the Core on the physical Core.

Troubleshooting This section describes common deployment issues and solutions. If the FC LUN is not detected on the ESXi system on which the Core-v is running, try performing these debugging steps: 1. Rescan the ESXi system storage adapters.

2. Make sure that you are looking at the right HBA on ESXi system.

3. Make sure that the ESXi system has been allowed to access the FC LUN on the storage system, and check initiator and storage groups.

4. Make sure that the zoning configuration on the FC switch is correct.

5. Refer to VMware documentation and support for further assistance on troubleshooting FC connectivity issues. If you deployed the VM on the LUN with the same ESXi or ESXi cluster to deploy the Core-v, and the datastore is still mounted, you might detect the FC LUN on the ESXi system, but the LUN does not appear on the list of the LUNs that are presented as RDM to Core-v. If this is the case, perform the following procedure to unmount the datastore from the ESXi system.

SteelFusion Design Guide 69 SteelFusion and Fibre Channel Related information

To unmount the datastore from the ESXi system

1. To unmount the FC VMFS datastore, select the Configuration tab, select View: Datastores, right- click a datastore, and select Unmount.

Figure 4-11. Unmounting a datastore

2. To detach the corresponding device from ESXi, select view Devices, right-click a device, and select Detach.

3. Rescan twice.

Figure 4-12. Rescanning a device

4. To attach the device by viewing Devices, right-click a device, and select Attach.

5. Do not rescan, but view Devices and verify that the datastore is removed from the datastore list.

6. Readd the device as the RDM disk to the Core-v. If the FC RDM LUN is not visible on the Core-v, try the following debugging actions:

 Select the Rescan for new LUNs process on the Core-v several times.

 Check the Core-v logs for failures.

Related information

 SteelFusion Core Management Console User’s Guide

 SteelFusion Edge Management Console User’s Guide

 SteelFusion Core Installation and Configuration Guide

70 SteelFusion Design Guide Related information SteelFusion and Fibre Channel

 SteelFusion Command-Line Interface Reference Manual

 Fibre Channel on SteelFusion Core Virtual Edition Solution Guide

 Riverbed Splash at https://splash.riverbed.com/community/product-lines/steelfusion

SteelFusion Design Guide 71 SteelFusion and Fibre Channel Related information

72 SteelFusion Design Guide 5

Configuring the Edge in iSCSI Mode

This chapter describes the process for configuring Edge at the branch office. It includes the following topics:

 “SteelFusion Edge appliance architecture” on page 73

 “Interface and port configurations” on page 74

 “Configuring iSCSI initiator timeouts” on page 77

 “Configuring disk management on the Edge appliance” on page 77

 “Configuring SteelFusion storage” on page 78

 “MPIO in Edge” on page 79

 “Related information” on page 79

Note: This chapter focuses on the deployment of SteelFusion block storage mode with iSCSI or Fibre Channel connected storage. For deployments that are using SteelFusion in NFS/file mode, see Chapter 13, “SteelFusion and NFS.”

SteelFusion Edge appliance architecture The Edge contains two distinct nodes within the same hardware chassis (Figure 5-1).

Figure 5-1. SteelFusion Edge appliance architecture - block storage mode

The two nodes are as follows:

SteelFusion Design Guide 73 Configuring the Edge in iSCSI Mode Interface and port configurations

 The RiOS node provides networking, WAN optimization, direct attached storage available for SteelFusion use, and VSP functionality.

 The hypervisor node provides hypervisor-based hardware resources and software virtualization. The two-node design provides hardware resource separation and isolation.

Note: In an NFS/file deployment the internal architecture remains virtually unchanged, with the two-node design continuing to provide the same basic functionality. But there are some minor alterations and additions in order to provide the NFS services required. For more details, see Chapter 13, “SteelFusion and NFS.”

For details on the RiOS and hypervisor nodes, see the SteelFusion Edge Installation and Configuration Guide and the SteelFusion Edge Hardware and Maintenance Guide.

Interface and port configurations This section describes a typical port configuration for the Edge. You might require additional routing configuration depending on your deployment scenario. This section includes the following topics:

 “Edge appliances ports” on page 75

 “Moving the Edge to a new location” on page 76

 “Configuring Edge for jumbo frames” on page 76

 “Configuring iSCSI initiator timeouts” on page 77

74 SteelFusion Design Guide Interface and port configurations Configuring the Edge in iSCSI Mode

Edge appliances ports

The following table summarizes the ports that connect the SteelFusion Edge appliance to your network. For more information about the Edge appliances, see the SteelFusion Edge Hardware and Maintenance Guide.

Port Description

Primary (PRI) The Primary port connected the Edge to a LAN switch. It provides access to the Management Console and Edge CLI. This interface is also used to connect to the Core through the Edge RiOS node in-path interface. Auxiliary (AUX) When Storage Edge is enabled on the SteelHead EX, use the Auxiliary port to connect SteelHead EX to the management VLAN. You can connect a computer directly to the appliance with a crossover cable, enabling you to access the CLI or Management Console of the Edge RiOS node. lan1_0 The Edge RiOS node uses one or more in-path interfaces to provide Ethernet network connectivity for optimized traffic. Each in-path interface comprises two physical ports: the LAN port and the WAN port. Use the LAN port to connect the Edge RiOS node to the internal network of the branch office. You can also use this port for a connection to the Primary port. A connection to the Primary port enables the blockstore traffic sent between Edge and Core to transmit across the WAN link. wan1_0 The WAN port is the second of two ports that comprise the Edge RiOS node in- path interface. The WAN port is used to connect the Edge RiOS node toward WAN-facing devices such as a router, firewall, or other equipment located at the WAN boundary. If you need additional in-path interfaces, or different connectivity for in-path (for example, 10 GigE or Fiber), then you can install a bypass NIC in an Edge RiOS node expansion slot. eth0_0 to eth0_1 These ports are available as standard on the SteelFusion Edge appliance. When configured for use by Edge RiOS node, the ports can provide additional iSCSI interfaces for storage traffic to external servers. These ports also enable the ability to provide redundancy in the form of MPIO or SteelFusion Edge high availability (HA). In such an HA design, we recommend that you use the ports for the heartbeat and BlockStream synchronization between the SteelFusion Edge HA peers. If additional iSCSI connectivity is required in an HA design, then install a nonbypass data NIC in the Edge RiOS node expansion slot. gbe0_0 to gbe0_3 These ports are available as standard on the SteelFusion Edge appliance. When configured for use by SteelFusion Edge hypervisor node, these 1 Gbps ports provide LAN connectivity to external clients. The ports are connected to a LAN switch using a straight-through cable. If additional connectivity is required for the hypervisor node, you can install a nonbypass data NIC in a hypervisor node expansion slot. There are no expansion slots available for the hypervisor node on the SFED 2100 and 2200 models. There are two expansion slots on the SFED 3100, 3200, and 5100 models.

Note: All the above interfaces are gigabit capable. Where it is practical, use gigabit speeds on interface ports that are used for iSCSI traffic.

SteelFusion Design Guide 75 Configuring the Edge in iSCSI Mode Interface and port configurations

Moving the Edge to a new location

If you began your SteelFusion deployment by initially configuring and loading the Edge appliance in the data center, you might have to change the IP addresses of various network ports on the Edge after you move it to its final location in the remote office. The Edge configuration includes the IP address of the Core and it initiates the connection to the Core when it is active. Because Core does not track the Edge by IP address, it is safe to change the IP addresses of the network ports on the Edge when you move it to its final location. The iSCSI adapter within the VSP of the Edge needs to be reconfigured with the new IP address of the Edge.

Configuring Edge for jumbo frames

You can have one or more external application servers in the branch office that use the LUNs accessible from the Edge iSCSI portal. If your network infrastructure supports jumbo frames, we recommend that you configure the connection between the Edge and application servers as described below. If you are using VSP for hosting all your branch application servers, then you can ignore the following two procedures because the iSCSI traffic is internal to the Edge.

Note: VSP VMs do not support jumbo frames.

In addition to configuring Edge for jumbo frames, you must configure the external application servers and any switches, routers, or other network devices between Edge and the application server for jumbo frame support.

To configure Edge primary interface for jumbo frames

1. In the Edge Management Console, choose Networking > Networking: Base Interfaces.

2. In the Primary Interface box:

 Select Enable Primary Interface.

 Select Specify IPv4 Address Manually option, and specify the correct values for your implementation.

 For the MTU setting, specify 9000 bytes.

3. Click Apply to apply the settings to the current configuration.

4. Click Save to save your changes permanently. For more details about interface settings, see the SteelFusion Edge Management Console User’s Guide.

To configure Edge Ethernet interfaces for jumbo frames

1. In the Edge Management Console, choose Networking > Networking: Data Interfaces.

2. In the Data Interface Settings box:

 Select the required data interface (for example: eth1_0).

 Select Enable Interface.

76 SteelFusion Design Guide Configuring iSCSI initiator timeouts Configuring the Edge in iSCSI Mode

 Select the Specify IPv4 Address Manually option and specify the correct values for your implementation.

 For the MTU setting, specify 9000 bytes.

3. Click Apply to apply the settings to the current configuration.

4. Click Save to save your changes permanently. For more details about interface settings, see the SteelFusion Edge Management Console User’s Guide. For more information about jumbo frames, see “Configure jumbo frames” on page 172.

Configuring iSCSI initiator timeouts The Edge acts as the iSCSI portal for any internal (VSP) hosted application servers, but also for any external application servers. In the case of external servers, consider adjusting the iSCSI Initiator timeout settings on the server. This adjustment could improve the ability for the initiator to survive minor outages involving MPIO or other HA configurations. For more details and guidance, see “Microsoft iSCSI initiator timeouts” on page 176 and documentation provided by the iSCSI initiator supplier.

Configuring disk management on the Edge appliance On the SteelFusion Edge appliance, you can specify the size of the local LUN during the hypervisor installation, before the appliance is connected to the Core. During the installation, choose Direct Attached Storage. You are prompted to choose the percentage amount of the available blockstore you want to use as local storage. A single LUN is created, formatted as VMFS5, and mounted as a datastore to ESX as rvbd_vsp_datastore. If the fixed percentages for direct attached storage are not appropriate for the LUN size desired, click Advanced Storage Settings in the installer to specify the exact size of the local LUN, change file system type to the appropriate VMFS version, and choose a different name for the datastore.

Figure 5-2. Hypervisor Installer page

SteelFusion Design Guide 77 Configuring the Edge in iSCSI Mode Configuring SteelFusion storage

To install multiple local LUNs

1. Connect the Edge to the Core.

2. Create the LUNs on the backend storage and map them to Edge.

3. Pin the LUNs and finish synchronization to the Edge.

4. Offline the LUNs on the Core.

5. Remove the Core. Select Preserve local iSCSI Storage Configuration (Figure 5-3).

Figure 5-3. Removing the Core

SteelFusion 4.0 and later can preserve a SteelFusion Edge configuration for Local LUNs, initiators, and initiator groups after unpairing from the Core. Now you can connect the SteelFusion Edge to a Core; the preserved LUNs remain as local LUNs, and the rest of the local space is used for the blockstore.

Configuring SteelFusion storage Complete the connection to the Core by choosing Storage > Storage Edge Configuration on the Edge Management Console, specifying the Core IP address, and defining the Edge Identifier (among other settings). You need the following information to configure Edge storage:

 Hostname/IP address of the Core.

 Edge Identifier, the value of which is used on the Core-side configuration for mapping LUNs to specific Edge appliances. The Core identifier is case sensitive.

 Self Identifier. If you configure failover, both appliances must use the same self-identifier. In this case, you can use a value that represents the group of appliances.

 Port number of the Core. The default port is 7970.

 The interface for the current Edge to use when connecting with the Core.

78 SteelFusion Design Guide MPIO in Edge Configuring the Edge in iSCSI Mode

For details about this procedure, see the SteelHead Management Console User’s Guide and the SteelFusion Edge Management Console User’s Guide.

MPIO in Edge When you configure multiple path I/O (MPIO) for the Edge, it means you can enable multiple local interfaces through which the iSCSI initiator can connect to iSCSI targets in the Edge. Redundant connections help prevent loss of connectivity in the event of an interface, switch, cable, or other physical failure. In the Core Management Console, choose Configure > Storage Arrays: iSCSI, Initiators, MPIO to access controls to add or remove MPIO interfaces. Once specified, the interfaces are available for the iSCSI initiator to connect with the Edge. For details, see the SteelFusion Edge Management Console User’s Guide.

Related information

 SteelFusion Core Management Console User’s Guide

 SteelFusion Edge Management Console User’s Guide

 SteelFusion Core Installation and Configuration Guide

 SteelFusion Command-Line Interface Reference Manual

 Riverbed Splash at https://splash.riverbed.com/community/product-lines/steelfusion

SteelFusion Design Guide 79 Configuring the Edge in iSCSI Mode Related information

80 SteelFusion Design Guide 6

SteelFusion Appliance High-Availability Deployment

This chapter describes high-availability (HA) deployments for the Core and the Edge. It includes the following sections:

 “Overview of storage availability” on page 81

 “Core high availability” on page 82

 “Edge high availability” on page 94

 “Recovering from split-brain scenarios involving Edge HA” on page 102

 “Testing HA failover deployments” on page 103

 “Configuring WAN redundancy” on page 103

 “Related information” on page 105 For information about setting up Core-v HA, see “Configuring Fibre Channel LUNs in a Core-v HA scenario” on page 64.

Note: Features, settings, and guidance described in this chapter may apply to SteelFusion deployments in both block storage (iSCSI) mode and NFS/file mode. Where indicated, content is applicable to block storage mode only. For deployments that are using SteelFusion in NFS/file mode, also see Chapter 13, “SteelFusion and NFS.”

Overview of storage availability Applications of any type that read and write data to and from storage can suffer from three fundamental types of availability loss:

 Loss of storage

 Loss of access to the storage

 Loss of the data residing on the storage As with a typical storage deployment, you might consider data HA and redundancy as a mandatory requirement rather than an option. Applications accessing data are always expecting the data, and the storage that the data resides on, to be available at all times. If for some reason the storage is not available, then the application ceases to function.

SteelFusion Design Guide 81 SteelFusion Appliance High-Availability Deployment Core high availability

Storage availability is the requirement to protect against loss of access to stored data or loss of the storage in which the data resides. Storage availability is subtly different from data loss. In the case of data loss, whether due to accidental deletion, corruption, theft, or another event, you can recover the data from a snapshot, backup, or some other form of archive. If you can recover the lost data, it means that you previously had a process to copy data, either through snapshot, backup, replication, or another data management operation. In general, the net effect of data loss or lack of storage availability is the same—loss of productivity. But the two types of data loss are distinct and addressed in different ways. The subject of data availability in conjunction with the SteelFusion product family is documented in a number of white papers and other documents that describe the use of snapshot technology and data replication as well as backup and recovery tools. To read the white papers, go to https:// support.riverbed.com. The following sections discuss how to make sure you have storage availability in both the Core and Edge deployments.

Note: Core HA and Edge HA are independent from each other. You can have Core HA with no Edge HA, and vice versa.

Core high availability This section describes HA deployments for the Core. It contains the following topics:

 “Core with MPIO” on page 83

 “Core HA concepts” on page 83

 “Configuring HA for Core” on page 84 You can deploy a Core as a single, stand-alone implementation; however, we strongly recommend that you always deploy the Core as pairs in an HA cluster configuration. The storage arrays and the storage area network (SAN) the Core attaches to are generally deployed in a redundant manner. For more information about Core HA clusters, see “Core HA concepts” on page 83. For more information about single-appliance implementation, see “Single-appliance deployment” on page 27. In addition to the operational and hardware redundancy provided by the deployment of Core clusters, you can also cater to network redundancy. When connecting to a SAN using iSCSI, Cores support the use of multiple path input and output (multipath I/O or MPIO). MPIO uses two separate network interfaces on the Core to connect to two separate iSCSI portals on the storage array. The storage array must support MPIO. Along with network redundancy, MPIO enables for scalability by load-balancing storage traffic between the Core and the storage array.

Note: MPIO is also supported on Edge deployments in which the LUNs available from Edge are connected to servers operating in the branch office.

For more information about MPIO with Core, see “Core with MPIO” on page 83. For information about MPIO with Edge, see “SteelFusion Edge with MPIO” on page 98. For information about setting up Core-v HA, see “Configuring Fibre Channel LUNs in a Core-v HA scenario” on page 64. For information about setting up Core HA with FusionSync (SteelFusion Replication), see Chapter 7, “SteelFusion Replication (FusionSync).”

82 SteelFusion Design Guide Core high availability SteelFusion Appliance High-Availability Deployment

Core with MPIO

Note: Block storage mode only.

MPIO ensures that a failure of any single component (such as a network interface card, switch, or cable) does not result in a communication problem between the Core and the storage array. Figure 6-1 shows an example of a basic Core deployment using MPIO. The figure shows a single Core with two network interfaces connecting to the iSCSI SAN. The SAN has a simple full mesh network design enabling each Core interface to connect to each iSCSI portal on the storage array.

Figure 6-1. Basic topology for Core MPIO

When you configure a Core for MPIO, by default it uses a round-robin policy for any read operations to the LUNs in the storage array. Write operations use a fixed-path policy, only switching to an alternative path in the event of a path or portal failure. For more details about MPIO configuration for the Core, see the SteelFusion Core Management Console User’s Guide.

Core HA concepts

Note: The general concepts described in this section apply whether the storage is LUNs or NFS exports. For more information on deployments using SteelFusion in NFS/file mode, see Chapter 13, “SteelFusion and NFS.”

A pair of Cores deployed in an HA-failover cluster configuration are active-active. Each Core is the primary to itself and secondary to its peer. Both peers in the cluster are attached to storage in the data center. But individually they each are responsible for projecting one or more LUNs to one or more Edge appliances in branch locations. Each Core is configured separately for the LUNs and the Edges it is responsible for. When you enable failover on the Core, you can choose which individual LUNs are part of the HA configuration. By default in a Core HA deployment, all LUNs are automatically configured for failover. You can selectively disable failover on an individual LUN basis in the Management Console by choosing Configure > Manage: LUNs. LUNs that are not included in the HA configuration are not available at the Edge if the Core fails.

SteelFusion Design Guide 83 SteelFusion Appliance High-Availability Deployment Core high availability

As part of the HA deployment, you configure each Core with the details of its failover peer. This deployment comprises two IP addresses of network interfaces called failover interfaces. These interfaces are used for heartbeat and synchronization of the peer configuration. After the failover interfaces are configured, the failover peers use their heartbeat connections (failover interfaces) to share the details of their storage configuration. This information includes the LUNs they are responsible for and the Edges they are projecting the LUNs to. If either peer fails, the surviving Core can take over control of the LUNs from the failed peer and continue projecting them to the Edges. Core HA failover is triggered at the Core level. If an Edge loses connection to its primary Core, but still has connectivity to the secondary Core, no failover occurs. No failover occurs because both Cores continue to detect each other's heartbeat through the failover interfaces. The Edge enters a disconnected operations state as normal and saves write operations to the blockstore until connectivity to the primary Core is restored.

Note: Make sure that you size both failover peers correctly so that they have enough capacity to support the other Core storage in the event of a peer failure. If the surviving peer does not have enough resources (CPU and memory), then performance might degrade in a failure situation.

After a failed Core has recovered, the failback is automatic.

Configuring HA for Core

This section describes best practices and the general procedure for configuring high availability between two Cores.

Note: Core HA configuration is independent of Edge HA configuration.

This section contains the following topics:

 “Cabling and connectivity for clustered Cores” on page 85

 “Configuring failover peers” on page 86

 “Accessing a failover peer from a Core” on page 90

 “SCSI reservations between Core and storage arrays” on page 91

 “Failover states and sequences” on page 91

 “Recovering from failure of both Cores in HA configuration” on page 92

 “Removing Cores from an HA configuration” on page 92

84 SteelFusion Design Guide Core high availability SteelFusion Appliance High-Availability Deployment

Cabling and connectivity for clustered Cores Figure 6-2 shows an example of a basic HA topology including details of the different network interfaces used.

Note: Use crossover cables for connecting ports in clustered Cores.

Figure 6-2. Basic topology for Core HA

In the scenario shown in Figure 6-2, both Cores (Core A and Core B) connect to the storage array through their respective eth0_0 interfaces. Notice that the eth0_1 interfaces are also used in this example for MPIO or additional SAN connectivity. The Cores communicate between each other using the failover interfaces that are configured as eth0_2 and eth0_3. Their primary interfaces are dedicated to the traffic VLAN that carries data to and from Edge appliances. The auxiliary interfaces are connected to the management VLAN and used to administer the Cores. You can administer a Core from any of its configured interfaces assuming they are reachable. Use the AUX interface as a dedicated management interface rather than using one of the other interfaces that might be in use for storage data traffic. When it is practical, use two dedicated failover interfaces for the heartbeat. Connect the interfaces through crossover cables and configure them using private IP addresses. This connection minimizes the risk of a split-brain scenario in which both Core peers consider the other to have failed. Directly connected, dedicated interfaces might not be possible for some reason. If the dedicated connections need to go through some combinations of switches and/or routers, they must use diverse paths and network equipment to avoid a single point of failure. If you cannot configure two dedicated interfaces for the heartbeat, then an alternative is to specify the primary and auxiliary interfaces. Consider this option only if the traffic interfaces of both Core peers are connecting to the same switch or are wired so that a network failure means one of the Cores loses connection to all Edge appliances. You can configure Cores with additional NICs to provide more network interfaces. These NICs are installed in PCIe slots within the Core. Depending on the type of NIC you install, the network ports could be 1-Gb Ethernet or 10-Gb Ethernet. In either case, you can use the ports for storage or heartbeat connectivity. The ports are identified as ethX_Y where X corresponds to the PCIe slot (from 1 to 5) and Y refers to the port on the NIC (from 0 to 3 for a four-port NIC and from 0 to 1 for a two-port NIC). For more information about Core ports, see “Interface and port configuration” on page 36.

SteelFusion Design Guide 85 SteelFusion Appliance High-Availability Deployment Core high availability

You can use these additional interfaces for iSCSI traffic or heartbeat. Use the same configuration guidance as already described above for the eth0_0 to eth0_3 ports. Under normal circumstances the heartbeat interfaces need to be only 1 Gb; therefore, it is simpler to use eth0_2 and eth0_3 as already described. However, there can be a need for 10-Gb connectivity to the iSCSI SAN, in which case you can use an additional NIC with 10-Gb ports in place of eth0_0 and eth0_1. If you install the NIC in PCIe slot 1 of the Core, then the interfaces are identified as eth1_0 and eth1_1 in the Core Management Console. When using multiple interfaces for storage connectivity in an HA deployment, all interfaces should match in terms of their capabilities. Therefore, avoid mixing combinations of 1 Gb and 10 Gb for storage connectivity.

Configuring failover peers You configure Core high availability by choosing Configure > Failover: Failover Configuration. To configure failover peers for Core, you need to provide the following information for each of the Core peers:

 The IP address of the peer appliance

 The local failover interface through which the peers exchange and monitor heartbeat messages

 An additional IP address of the peer appliance

 An additional local failover interface through which the peers exchange and monitor heartbeat messages Figure 6-3 shows an example deployment with failover interface IP addresses. You can configure any interface as a failover interface, but to maintain some consistency we recommend that you configure and use eth0_2 and eth0_3 as dedicated failover interfaces.

Figure 6-3. Core HA failover interface design

86 SteelFusion Design Guide Core high availability SteelFusion Appliance High-Availability Deployment

Figure 6-4 shows the Failover Configuration page for Core A in which the peer is Core B. The failover interface IP addresses are 20.20.20.22 and 30.30.30.33 through interfaces eth0_2and eth0_3 respectively. The page shows eth0_2 and eth0_3 selected from the Local Interface drop-down list and the IP addresses of Core B interfaces are completed. Notice that in the Configuration page you can select the interface you want to use for connections from the failover peer Edge appliances. This example shows that the primary interface has been chosen.

Figure 6-4. Core Failover Configuration page

SteelFusion Design Guide 87 SteelFusion Appliance High-Availability Deployment Core high availability

After you click Enable Failover, the Core attempts to connect through the failover interfaces sending the storage configuration to the peer. If successful, you see the Device Failover Settings as shown in Figure 6-5.

Figure 6-5. Core HA Failover Configuration page 2

88 SteelFusion Design Guide Core high availability SteelFusion Appliance High-Availability Deployment

After the Core failover has been successfully configured, you can log in to the Management Console of the peer Core and view its Failover Configuration page. Figure 6-6 shows that the configuration page of the peer is automatically configured with the relevant failover interface settings from the other Core.

Figure 6-6. Core HA Peer Failover configuration page 3

Even though the relevant failover interfaces are automatically configured on the peer, you must configure the peer Preferred Interfaces for Edge Connections. By default, the primary interface is selected. For more information about HA configuration settings, see the SteelFusion Core Management Console User’s Guide. In the Core CLI, you can configure failover using the device-failover peer and device-failover peerip commands. To display the failover settings use the show device-failover command. For more information, see the SteelFusion Command-Line Interface Reference Manual. If the failover configuration is not successful, then details are available in the Core log files and a message is displayed in the user interface. The failure can be for any number of different reasons. Some examples, along with items to check, are as follows:

 Unable to contact peer - Check the failover interface configurations (IP addresses, interface states and cables).

 Peer is already configured as part of a failover pair - Check that you have selected the correct Core.

 The peer configuration includes one or more LUNs that are already assigned to the other Core in the failover pair - Check the LUN assignments and correct the configuration.

SteelFusion Design Guide 89 SteelFusion Appliance High-Availability Deployment Core high availability

 The peer configuration includes one or more Edge appliances that are already assigned to the other Core in the failover pair - Check the Edge assignments and correct the configuration. After the failover configuration is complete and active, the configurations of the two peers in the cluster are periodically exchanged through a TCP connection using port 7971 on the failover interfaces. If you change or save either Core configuration, the modified configuration is sent to the failover peer. In this way, each peer always has the latest configuration details of the other. You configure any Edge that is connecting to a Core HA configuration with the primary Core details (hostname or IP). After connected to the primary Core, the Edge is automatically updated with the peer Core information. This information ensures that during a Core failover situation in which an Edge loses its primary Core, the secondary Core can signal the Edge that it is taking over. The automatic update also minimizes the configuration activities required at the Edge regardless of whether you configure Core HA or not.

Accessing a failover peer from a Core Note: Block storage mode only.

When you configure a Core for failover with a peer Core, all storage configuration pages include an additional feature that enables you to access and modify settings for both the current appliance you are logged in to and its failover peer. You can use a drop-down list below the page title to select Self (the current appliance) or Peer. The page includes the message Device Failover is enabled, along with a link to the Failover Configuration page. Figure 6-7 shows two sample iSCSI Configuration pages: one without HA enabled and one with HA enabled, showing the drop-down list.

Figure 6-7. Failover-enabled feature on Storage Configuration pages

Note: Because you can change and save the storage configuration settings for the peer in a Core HA deployment, ensure that any configuration changes are made for the correct Core.

90 SteelFusion Design Guide Core high availability SteelFusion Appliance High-Availability Deployment

Additionally, the Core storage report pages include a message that indicates when device failover is enabled, along with a link to the Failover Configuration page. You must log in to the peer Core to view the storage report pages for the peer.

SCSI reservations between Core and storage arrays Note: Block storage mode only.

When you deploy two Cores as failover peers it is an active-active configuration. Each Core is primarily responsible for the LUNs that is has been configured with. As part of the HA configuration, some, if not all, of the LUNs are enabled for failover. During a failover scenario, the surviving peer takes over the LUNs of the failed peer that have been enabled for failover. To be able to take over the LUNs in a safe and secure way, the Core makes use of SCSI reservations to the back-end storage array. SCSI reservations are similar in concept to client file-locking on a file server. The SCSI reservation is made by the initiator and provides a way to prevent other initiators from making changes to the LUN. Prior to making a reservation, the initiator must first make a Register request for the LUN. This request is in the form of a Reservation key. After the storage array acknowledges the reservation key, the reservation is made. The Core registers requests to the storage array for each LUN it is responsible for. It then makes persistent reservations for each LUN. A persistent reservation is maintained across power failures and reboots of the initiator and target devices. It can only be cleared by the initiator releasing the reservation, or an initiator preempting the reservation. In a Core HA deployment, each peer knows the LUNs that are enabled for failover on the other peer. Because of knowledge, in a failover scenario, a surviving peer can send the storage array a request to read current reservations for each of the relevant LUNs. The storage array responds with the reservation keys of the failed Core. The surviving peer sends a preempt reservation request for each LUN that it needs to take control of from the failed peer. The preempt reservation request comprises the reservation key of the failed peer and its own registration key for each LUN. Because of the requirement to transfer persistent reservations between peer Cores during a failover or failback scenario, your storage array might need to be explicitly configured to allow this transfer. The actual configuration steps required depend on the storage array vendor but might involve some type of setting for simultaneous access. For details, consult relevant documentation of the storage array vendor.

Failover states and sequences At the same time as performing their primary functions associated with projecting LUNs, each Core in an HA deployment is using its heartbeat interfaces to check if the peer is still active. By default, the peers check each other every 3 seconds through a heartbeat message. The heartbeat message is sent through TCP port 7972 and contains the current state of the peer that is sending the message. The state is one of the following:

 ActiveSelf - The Core is healthy, running its own configuration and serving its LUNs as normal. It has an active heartbeat with its peer.

 ActiveSolo - The Core is healthy but the peer is down. It is running its own configuration and that of the failed peer. It is serving its LUNs and also the LUNs of the failed peer.

SteelFusion Design Guide 91 SteelFusion Appliance High-Availability Deployment Core high availability

 Inactive - The Core is healthy but has just started up and cannot automatically transition to ActiveSolo or ActiveSelf. Typically this state occurs if both Cores fail at the same time. To complete the transition, you must manually activate the correct Core. For more information, see “Recovering from failure of both Cores in HA configuration” on page 92.

 Passive - The default state when Core starts up. Depending on the status of the peer, the Core state transitions to Inactive, ActiveSolo, or ActiveSelf. If there is no response from three consecutive heartbeats, then the secondary Core declares the primary failed and initiates a failover. Both Cores in an HA deployment are primary for their own functions and secondary for the peer. Therefore, whichever Core fails, it is the secondary that takes control of the LUNs from the failed peer. After the failover is initiated, the following sequence of events occurs: 1. The secondary Core preempts a SCSI reservation to the storage array for all of the LUNs that the failed Core is responsible for in the HA configuration.

2. The secondary Core contacts all Edges that are being served by the failed (primary) Core.

3. The secondary Core begins serving LUNs to the Edges. The secondary Core continues to issue heartbeat messages. Failback is automatic after the failed Core comes back online and can send its own heartbeat messages again. The failback sequence is effectively a repeat of the failover sequence with the primary Core going through the three steps described above.

Recovering from failure of both Cores in HA configuration You can have a scenario in which both Cores in an HA configuration fail at the same time; for example, a major power outage. In this instance there is no opportunity for either Core to realize that its peer has failed. When both Core appliances reboot, each peer knows that itself has failed but does not have status from the other to say that it had been in an ActiveSolo state. Therefore, both Cores remain in Inactive state. This state ensures that neither Core is projecting LUNs until you manually activate the correct Core. To activate the correct Core, choose Configure > Failover: Failover Configuration and select Activate Config. After you activate the correct Core, it transitions to ActiveSolo. Both Core appliances transition to ActiveSelf.

Removing Cores from an HA configuration This section describes the procedure for removing two Cores from their failover configuration.

To remove Cores from an HA configuration (basic steps)

1. Force one of the Cores into a failed state by stopping its service.

2. Disable failover on the other Core.

3. Start the service on the first Core again.

92 SteelFusion Design Guide Core high availability SteelFusion Appliance High-Availability Deployment

4. Disable the failover on the second Core. You can perform these steps using either the Management Console or the CLI. Figure 6-8 shows an example configuration.

Figure 6-8. Example configuration of Core HA deployment

To remove the Cores from an HA deployment using the Management Console (as shown in Figure 6-8)

1. From the Management Console of Core A, choose Settings > Maintenance: Service.

2. Stop the Core service.

3. From the Management Console of Core B, choose Configure > Failover: Failover Configuration.

4. Click Disable Failover.

5. Return to the Management Console of Core A, and choose Settings > Maintenance: Service.

6. Start the Core service.

7. From the Management Console of Core A, choose Configure > Failover: Failover Configuration.

8. Click Disable Failover.

9. Click Activate Local Configuration. Core A and Core B are no longer operating in an HA configuration.

To remove the Cores from an HA deployment using the CLI (as shown in Figure 6-8)

1. Connect to the CLI of Core A and enter the following commands to stop the Core service:

enable configure terminal no service enable

2. Connect to the CLI of Core B and enter the following commands to clear the local failover configuration:

SteelFusion Design Guide 93 SteelFusion Appliance High-Availability Deployment Edge high availability

enable configure terminal device-failover peer clear write memory

3. Return to the CLI of Core A and enter the following commands to start the Core service, clear the local failover configuration, and return to nonfailover mode:

enable configure terminal service enable device-failover peer clear device-failover self-config activate. write memory Core A and Core B are no longer operating in an HA configuration.

Edge high availability Note: For deployments using SteelFusion in NFS/file mode, see Chapter 13, “SteelFusion and NFS.”

The SteelFusion Edge appliance presents itself as an iSCSI target for storage to application servers in one or both of the following modes:

 Storage is resident in the RiOS node and is accessed through iSCSI internally to the appliance by the hypervisor node. In this mode, the hypervisor running in the hypervisor node is acting as the iSCSI initiator.

 Storage is accessed through an iSCSI initiator on a separate server or hypervisor host that is external to the SteelFusion Edge. In either deployment mode, in the unlikely event of failure or a scheduled loss of service due to planned maintenance, you might need an alternative way to access the storage and ensure continued availability of services in the branch. Deploying two SteelFusion Edges in a high availability (HA) configuration enables this access. This section describes HA deployments for the SteelFusion Edge appliance. It contains the following topics:

 “Using the correct Interfaces for SteelFusion Edge deployment” on page 94

 “Choosing the correct cables” on page 97

 “Overview of SteelFusion Edge HA” on page 98

 “SteelFusion Edge with MPIO” on page 98

 “SteelFusion Edge HA using blockstore synchronization” on page 99

 “SteelFusion Edge HA peer communication” on page 101

Note: This guide requires you to be familiar with the SteelFusion Edge Management Console User’s Guide.

Using the correct Interfaces for SteelFusion Edge deployment Note: Block storage mode only.

This section reviews the network interfaces on SteelFusion Edge and how you can configure them. For more information about Edge network interface ports, see “Edge appliances ports” on page 75.

94 SteelFusion Design Guide Edge high availability SteelFusion Appliance High-Availability Deployment

By default, all Edge appliances are equipped with the following physical interfaces:

 Primary, Auxiliary, eth0_0, eth0_1, lan1_0, wan1_0, lan1_1, wan1_1 - These interfaces are owned and used by the RiOS node in Edge.

 gbe0_0, gbe0_1, gbe0_2, gbe0_3 - These interfaces are owned and used by the hypervisor node in Edge. Traditionally in an Edge appliance, the LAN and WAN interface pairs are used by the RiOS node as an in- path interface for WAN optimization. The primary and auxiliary interfaces are generally used for management and other services. In an Edge HA deployment, the eth0_0 and eth0_1 interfaces are used for the heartbeat interconnect between the two SteelFusion Edge HA peers. If there is only a single Edge deployed in the branch, then you can use eth0_0 and eth0_1 as data interfaces for iSCSI traffic to and from servers in the branch that are external to the Edge. While there are many additional combinations of port usage, you can generally expect that iSCSI traffic to and from external servers in the branch use the primary interface. Likewise, the Rdisk traffic to and from the Core uses the primary interface by default and is routed through the SteelFusion Edge in-path interface. The Rdisk traffic gains some benefit from WAN optimization. Management traffic for the Edge typically uses the auxiliary interface. For the hypervisor node, you can use the gbe0_0 to gbe0_3 interfaces for general purpose LAN connectivity within the branch location. These interfaces enable clients to access services running in virtual machines installed on the hypervisor host. Figure 6-9 shows a basic configuration example for Edge deployment. The SteelFusion Edge traffic flows for Rdisk and iSCSI traffic are shown.

Figure 6-9. Basic interface configuration for SteelFusion Edge with external servers

SteelFusion Design Guide 95 SteelFusion Appliance High-Availability Deployment Edge high availability

Figure 6-10 shows no visible sign of iSCSI traffic because the servers that are using the LUNs projected from the data center are hosted within the hypervisor node resident on the Edge. Therefore, all iSCSI traffic is internal to the appliance. If a SteelFusion Edge deployment has no WAN optimization requirement, then you can connect the primary interface directly to the lan0_1 interface using a crossover cable, enabling the Rdisk traffic to flow in and out of the primary interface. In this case, management of the appliance is performed through the auxiliary interface. Clients in the branch access the servers in the hypervisor node by accessing through the network interfaces gbe0_0 to gbe0_3 (not shown in Figure 6-10).

Figure 6-10. Basic interface configuration for Edge with servers hosted in hypervisor node

Figure 6-11 shows a minimal interface configuration. The iSCSI traffic is internal to the appliance in which the servers are hosted within the hypervisor node. Because you can configure Edge to use the in- path interface for Rdisk traffic, this configuration makes for a very simple and nondisruptive deployment. The primary interface is still connected and can be used for management. Client access to the servers in the hypervisor node is through the gbe0_0 to gbe0_3 network interfaces (Figure 6-11). We do not recommend this type of deployment for permanent production use, but it can be suitable for a proof of concept in lieu of a complicated design.

Figure 6-11. Alternative interface configuration for SteelFusion Edge with servers hosted in hypervisor node

We recommend that you make full use of all the connectivity options available in the appliance for production deployments of Edge. Careful planning can ensure that important traffic, such as iSCSI traffic to external servers, Rdisk to and from the Core, and blockstore synchronization for high availability, are kept apart from each other. This separation helps with ease of deployment, creates a more defined management framework, and simplifies any potential troubleshooting activity.

96 SteelFusion Design Guide Edge high availability SteelFusion Appliance High-Availability Deployment

Depending on the model, Edge can be shipped or configured in the field with one or more additional multiple-port network interface cards (NICs). There is a selection of both copper and optical 1-GbE and 10-GbE NICs that fall into one of two categories. The two categories are bypass cards suitable for use as in-path interfaces for WAN optimization or data cards suitable for LAN connectivity. In the case of LAN connectivity, the data cards might be for any of the following examples:

 iSCSI traffic to and from servers in the branch that are external to SteelFusion Edge.

 Application traffic from clients in the branch connecting to application servers hosted in the Edge hypervisor node.

 Additional LAN connectivity for redundancy purposes (for example, MPIO, SteelFusion Edge HA, and so on). You cannot change the mode of the NIC from data to in-path or vice versa. For a complete list of available NICs, their part numbers and installation details, see SteelFusion Edge Hardware and Maintenance Guide.

Choosing the correct cables The LAN and WAN ports on the SteelFusion Edge bypass cards act like host interfaces during normal operation. During fail-to-wire mode, the LAN and WAN ports act as the ends of a crossover cable. Using the correct cable to connect these ports to other network equipment ensures proper operation during fail-to-wire mode and normal operating conditions. This cabling is especially important when you are configuring two SteelFusion Edge appliances in a serial in-path deployment for HA. We recommend that you do not rely on automatic MDI/MDI-X to automatically sense the cable type. The installation might be successful when the SteelFusion Edge is optimizing traffic, but it might not be successful if the in-path bypass card transitions to fail-to-wire mode. One way to help ensure that you use the correct cables during an installation is to connect the LAN and WAN interfaces of the bypass card while the SteelFusion Edge is powered off. This configuration proves that the devices on either side of the appliance can communicate correctly without any errors or other problems. In the most common in-path configuration, a SteelFusion Edge LAN port is connected to a switch and the SteelFusion Edge WAN port is connected to a router. In this configuration, a straight-through Ethernet cable can connect the LAN port to the switch, and you must use a crossover cable to connect the WAN port to the router. When you configure SteelFusion Edge in HA, it is likely that you have one or more additional data NICs installed into the appliance to provide extra interfaces. You can use the interfaces for MPIO and blockstore synchronization. This table summarizes the correct cable usage in the SteelFusion Edge when you are connecting LAN and WAN ports or when you are connecting data ports.

Devices Cable

SteelFusion Edge to SteelFusion Edge Crossover SteelFusion Edge to router Crossover

SteelFusion Design Guide 97 SteelFusion Appliance High-Availability Deployment Edge high availability

Devices Cable

SteelFusion Edge to switch Straight-through SteelFusion Edge to host Crossover

Overview of SteelFusion Edge HA This section describes HA features, design, and deployment of SteelFusion Edge. You can assign the LUNs provided by SteelFusion Edge (which are projected from the Core in the data center) in a variety of ways. Whether used as a datastore for VMware ESXi in the hypervisor node, or for other hypervisors and discrete servers hosted externally in the branch office, the LUNs are always served from the SteelFusion Edge using the iSCSI protocol. Because of the way the LUNs are served, you can achieve HA with SteelFusion Edge by using one or both of the following two options:

 “SteelFusion Edge with MPIO” on page 98

 “SteelFusion Edge HA using blockstore synchronization” on page 99 These options are independent of any HA Core configuration in the data center that is projecting one or more LUNs to the SteelFusion Edge. However, because of different SteelFusion Edge deployment options and configurations, there are several scenarios for HA. For example, you can consider hardware redundancy consisting of multiple power supplies or RAID inside the SteelFusion Edge appliance as a form of HA. For more information about hardware, see the product specification documents. Alternatively, when you deploy two SteelFusion Edge appliances in the branch, you can configure the VSP on both devices to provide an active-passive capability for any VMs that are hosted on the hypervisor node. In this context, HA is purely from the point of view of the VMs themselves, and there is a separate SteelFusion Edge providing a failover instance of the hypervisor node. For more details about how to configure SteelFusion Edge HA, see the SteelFusion Edge Management Console User’s Guide.

SteelFusion Edge with MPIO Note: Block storage mode only.

In a similar way to how you use MPIO with Core and data center storage arrays, you can use SteelFusion Edge with MPIO at the branch. Using SteelFusion Edge with MPIO ensures that a failure of any single component (such as a network interface card, switch, or cable) does not result in a communication problem between SteelFusion Edge and the iSCSI Initiator in the host device at the branch.

98 SteelFusion Design Guide Edge high availability SteelFusion Appliance High-Availability Deployment

Figure 6-12 shows a basic MPIO architecture for SteelFusion Edge. In this example, the primary and eth2_0 interfaces of the SteelFusion Edge are configured as the iSCSI portals, and the server interfaces (NIC-A and NIC-B) are configured as iSCSI Initiators. Combined with the two switches in the storage network, this basic configuration allows for failure of any of the components in the data path while continuing to enable the server to access the iSCSI LUNs presented by the SteelFusion Edge.

Figure 6-12. Basic topology for SteelFusion Edge MPIO

While you can use other interfaces on the SteelFusion Edge as part of an MPIO configuration, we recommend that you use the primary interface and one other interface that you are not using for another purpose. The SteelFusion Edge can have an additional multiport NIC installed to provide extra interfaces. This additional card is especially useful in HA deployments. The eth2_0 interface in this example is provided by an optional add-on four-port NIC that has been installed in slot 2 of the appliance. For more information about multiport NICs, see “Edge appliances ports” on page 75. When using MPIO with SteelFusion Edge, we recommend that you verify and adjust certain timeout variables of the iSCSI Initiator in the server to make sure that you have correct failover behavior. By default, the Microsoft iSCSI Initiator LinkDownTime timeout value is 15 seconds. This timeout value determines how much time the initiator holds a request before reporting an iSCSI connection error. If you are using SteelFusion Edge in an HA configuration, and MPIO is configured in the Microsoft iSCSI Initiator of the branch server, change the LinkDownTime timeout value to 60 seconds to allow the failover to finish.

Note: When you view the iSCSI MPIO configuration from the ESXi vSphere management interface, even though both iSCSI portals are configured and available, only iSCSI connections to the active SteelFusion Edge are displayed.

For more details about the specific configuration of MPIO, see the SteelFusion Edge Management Console User’s Guide.

SteelFusion Edge HA using blockstore synchronization Note: Block storage mode only.

SteelFusion Design Guide 99 SteelFusion Appliance High-Availability Deployment Edge high availability

While MPIO can cater to HA requirements involving network redundancy in the branch office, it still relies on the SteelFusion Edge itself being available to serve LUNs. To survive a failure of the SteelFusion Edge without downtime, you must deploy a second appliance. If you configure two appliances as an HA pair, the SteelFusion Edge can continue serving LUNs without disruption to the servers in the branch, even if one of the SteelFusion Edge appliances were to fail. The serving of LUNs in a SteelFusion Edge HA deployment can be used by the VSP of the second SteelFusion Edge and by external servers within the branch office. The scenario described in this section has two SteelFusion Edges operating in an active-standby role. This scenario is irrespective of whether the Core is configured for HA in the data center. The active SteelFusion Edge is connected to the Core in the data center and is responding to the read and write requests for the LUNs it is serving in the branch. This method of operation is effectively the same as with a single SteelFusion Edge; however, there are some additional pieces that make up a complete HA deployment.

Note: If you plan to configure two SteelFusion Edge appliances into an HA configuration at a branch, we strongly recommend you do this configuration at the time of installation. Adding a second SteelFusion Edge to form an HA pair at a later date is possible but is likely to result in disruption to the branch services while the reconfiguration is performed.

The standby SteelFusion Edge does not service any of the read and write requests but is ready to take over from the active peer. As the server writes new data to LUNs through the blockstore of the active SteelFusion Edge, the data is reflected synchronously to the standby peer blockstore. When the standby peer has acknowledged to the active peer that it has written the data to its own blockstore, the active peer then acknowledges the server. In this way, the blockstores of the two SteelFusion Edges are kept in lock step. Figure 6-13 illustrates a basic HA configuration for SteelFusion Edge. While this figure is a very simplistic deployment diagram, it highlights the importance of the best practice to use two dedicated interfaces between the SteelFusion Edge peers to keep their blockstores synchronized. We strongly recommend you configure the SteelFusion Edges to use eth0_0 and eth0_1 as their interfaces for synchronization and failover status. Using dedicated interfaces through crossover cables ensures that a split-brain scenario (in which both peer devices think the other has failed and start serving independently) is minimized.

100 SteelFusion Design Guide Edge high availability SteelFusion Appliance High-Availability Deployment

For more information about split-brain scenario, see “Recovering from split-brain scenarios involving Edge HA” on page 102.

Figure 6-13. Basic topology for SteelFusion Edge high availability

Prior to SteelFusion v4.2 software, although two interfaces are configured, only one interface is actively sending blockstore synchronization traffic. In SteelFusion v4.2 and later, the Edge software includes Multipath NetDisk. With this new feature, you can load balance the blockstore synchronization traffic across both interfaces. Multipath NetDisk continues to provide resiliency for the blockstore synchronization but also delivers higher performance. You do not need to do any additional configuration to enable this capability if both Edges are running SteelFusion v4.2 or later. If the interfaces used for blockstore synchronization are of different capacities (for example, one is 1 Gbps and the other is 10 Gbps), then we recommend that you specify the higher capacity interface first. For more configuration details, see the SteelFusion Edge Management Console User’s Guide.

SteelFusion Edge HA peer communication When you configure two SteelFusion Edges as active-standby peers for HA, they communicate with each other at regular intervals. The communication is required to ensure that the peers have their blockstores synchronized and that they are operating correctly based on their status (active or standby). The blockstore synchronization happens through two network interfaces that you configure for this purpose on the SteelFusion Edge. Ideally, these are dedicated interfaces, preferably connected through crossover cables. Although not the preferred method, you can send blockstore synchronization traffic through other interfaces that are already being used for another purpose. If interfaces must be shared, then avoid using the same interfaces for both iSCSI traffic and blockstore synchronization traffic. These two types of traffic are likely to be quite intensive. Instead, use an interface that is more lightly loaded: for example, management traffic. The interfaces used for the actual blockstore synchronization traffic are also used by each peer to check the status of one another through the heartbeat messages. The heartbeat messages provide each peer with the status of the other peer and can include peer configuration details.

SteelFusion Design Guide 101 SteelFusion Appliance High-Availability Deployment Recovering from split-brain scenarios involving Edge HA

A heartbeat message is sent by default every 3 seconds through TCP port 7972. If the peer fails to receive three successive heartbeat messages, then a failover event can be triggered. Because heartbeat messages are sent in both directions between SteelFusion Edge peers, there is a worst-case scenario in which failover can take up to 18 (3 x 3 x 2) seconds. Failovers can also occur due to administrative intervention: for example, rebooting or powering off a SteelFusion Edge. The blockstore synchronization traffic is sent between the peers using TCP port 7973. By default, the traffic uses the first of the two interfaces you configure. If the interface is not responding for some reason, the second interface is automatically used. If neither interface is operational, then the SteelFusion Edge peers enter into some predetermined failover state based on the failure conditions. The failover state on a SteelFusion Edge peer can be one of the following:

 Discover - Attempting to establish contact with the other peer.

 Active - Actively serving client requests; the standby peer is in sync with the current state of the system.

 Standby Sync - Passively accepting updates from the active peer; in sync with the current state of the system.

 Active Degraded - Actively serving client requests; cannot contact the standby peer.

 Active Rebuild - Actively serving client requests; sending the standby peer updates that were missed during an outage.

 Standby Rebuild - Passively accepting updates from the active peer; not yet in sync with the state of the system. For detailed information about how to configure two SteelFusion Edges as active-standby failover peers, the various failover states that each peer can assume while in an HA deployment, and the procedure required to remove an active-standby pair from that state, see the SteelFusion Edge Management Console User’s Guide.

Recovering from split-brain scenarios involving Edge HA Even though the communication between the peers of an Edge HA deployment is designed for maximum resiliency, there is a remote possibility of a failure scenario known as split brain. Split brain can occur if the heartbeat communication between the peers fails simultaneously in all aspects; that is, both heartbeat interfaces fail at the same time. If these interfaces are directly connected through cross-over cables then the possibility is extremely remote. But if the heartbeat interfaces are connected through network switches, then depending on the design and topology, split brain might occur. In a split-brain condition, both Edge appliances act as if the peer has failed. This action can result in both peers being Active Degraded. Because both peers can be simultaneously trying to serve data and also be synchronizing data back through the Core, this action could lead to data integrity issues.

102 SteelFusion Design Guide Testing HA failover deployments SteelFusion Appliance High-Availability Deployment

There are ways to recover from this scenario, but the best course of action to begin with is to contact Riverbed Support and open a support case. Any recovery process is likely to be different from another so the actual procedure may vary depending on the failure sequence.

Note: The recovery process can involve accessing a previous snapshot of the affected LUNs.

Testing HA failover deployments There are many ways that you can test a failover configuration of Core HA or Edge HA. These tests may include power-cycling, SteelFusion HA peer device reboot, or any number of network connection failure scenarios (routers, switches, cables). Your failover test should at least satisfy the basic requirements that ensure the SteelFusion HA deployment recovers as expected. The simplest test is to perform an orderly reboot of a SteelFusion peer device (Core or Edge) that is one half of an HA configuration.

Configuring WAN redundancy This section describes how to configure WAN redundancy. It includes the following topics:

 “Configuring WAN redundancy with no Core HA” on page 103

 “Configuring WAN redundancy in an HA environment” on page 105 You can configure the Core and Edge with multiple interfaces to use with MPIO. You can consider this configuration a form of local network redundancy. Similarly, you can configure multiple interfaces to provide a degree of redundancy across the WAN between the Core and the Edge. This redundancy ensures that any failure along the WAN path can be tolerated by the Core and the Edge, and is called WAN redundancy. WAN redundancy provides multiple paths for connection in case the main Core to Edge link fails. To configure WAN redundancy, you perform a series of steps on both the data center and branch side. You can use both the in-path interfaces (inpathX_Y) or Ethernet interfaces (ethX_Y) for redundant WAN link configuration. In the examples below the term intf is used to imply either in-path or Ethernet network interfaces.

Configuring WAN redundancy with no Core HA

Note: Block storage mode only.

This section describes how to configure WAN redundancy when you do not have a Core HA deployment.

To configure WAN redundancy

1. Configure local interfaces on the Edge. The interfaces are used to connect to the Core:

 From the Edge Management Console, choose Storage > Storage Edge Configuration.

SteelFusion Design Guide 103 SteelFusion Appliance High-Availability Deployment Configuring WAN redundancy

 Click Add Interface.

Figure 6-14. Edge interfaces

2. Configure preferred interfaces for connecting to the Edge on the Core:

 From the Core Management Console, choose Configure > Manage: SteelFusion Edges.

 Select Show Preferred Interfaces for SteelFusion Edge Connections.

 Click Add Interface.

Figure 6-15. Adding Core interfaces

On first connection, the Core sends all the preferred interface information to the Edge. The Edge uses this information along with configured local interfaces to connect on each link (local- interface and preferred-interface pair) until a successful connection is formed. The Edge tries each connection three times (and waits 3 seconds before the next try) before it moves on to the next; that is, if the first link fails, the next link is tried in nine seconds. After the Core and the Edge have established a successful alternative link, the Edge updates its Rdisk configuration with the change, so that the configuration is on the same link as the management channel between Core and Edge.

3. Remove the local interface for WAN redundancy on Edge:

 From the Edge Management Console, choose Storage > Storage Edge Configuration.

104 SteelFusion Design Guide Related information SteelFusion Appliance High-Availability Deployment

 Open the interface you want to remove.

 Click Remove Interface.

4. Remove preferred interfaces for WAN redundancy on the Core:

 From the Core Management Console, choose Configure > Manage: SteelFusion Edges.

 Select Show Preferred Interfaces for SteelFusion Edge Connections.

 Open the interface you want to remove.

 Delete the interface. Any change in the preferred interfaces on the Core is communicated to the Edge and the connection is updated as needed.

Configuring WAN redundancy in an HA environment

In a Core HA environment, the preferred interfaces information of the failover Core is sent to the Edge by the primary Core. The connection between the Edge and the failover Core follows the same logic in which a connection is tried on each link (local interface and preferred interface pair) until a connection is formed.

Related information

 SteelFusion Core Management Console User’s Guide

 SteelFusion Edge Management Console User’s Guide

 SteelFusion Core Installation and Configuration Guide

 SteelFusion Command-Line Interface Reference Manual

 Riverbed Splash at https://splash.riverbed.com/community/product-lines/steelfusion

SteelFusion Design Guide 105 SteelFusion Appliance High-Availability Deployment Related information

106 SteelFusion Design Guide 7

SteelFusion Replication (FusionSync)

This chapter describes FusionSync, a replication-based technology between Cores that enables the seamless branch continuity between two data centers. It includes the following topics:

 “Overview of SteelFusion replication” on page 107

 “Architecture of SteelFusion replication” on page 107

 “Failover scenarios” on page 111

 “FusionSync high-availability considerations” on page 113

 “Journal LUN metrics” on page 117

 “Related information” on page 118 As of the publication of this document, your Cores and Edges must run SteelFusion 4.0 or later.

Note: FusionSync is not supported in SteelFusion deployments configured for NFS/file mode.

Note: FusionSync is accessible through the Core Management Console through the Replication menu and through the coredr set of commands using the CLI. References to replication refer to the FusionSync feature.

For more information about SteelFusion replication, see SteelFusion Core Management Console User’s Guide.

Overview of SteelFusion replication A single data center is susceptible to large-scale failures (power loss, natural disasters, hardware failures) that can bring down your network infrastructure. To mitigate such scenarios, SteelFusion Replication enables you to connect branch offices to data centers across geographic boundaries and replicate data between them. In a SteelFusion setup, a typical storage area network (SAN) replication does not protect you from data loss in case of a data center failure, nor from network downtime that can affect many branch offices at the same time. FusionSync enables Cores in two geographically dispersed data centers to remain in synchronization and enables the Edges to switch to another Core in case of disaster. FusionSync can prevent data loss and downtime.

Architecture of SteelFusion replication This section describes the architecture of SteelFusion replication. It contains the following topics:

 “SteelFusion replication components” on page 108

SteelFusion Design Guide 107 SteelFusion Replication (FusionSync) Architecture of SteelFusion replication

 “SteelFusion replication design overview” on page 108

SteelFusion replication components

SteelFusion replication includes the following components:

 Primary Core - A role of Core that actively serves the LUNs to the Edges and replicates new writes to a secondary Core. During normal operations, the primary Core is located at the preferred data center. When a disaster, a failure, or maintenance affects the preferred data center, the primary Core fails over to the secondary Core at the disaster recovery site. After the failover, the secondary Core becomes primary.

 Secondary Core - A role of Core that receives replicated data from the primary Core. The secondary Core does not serve the storage LUNs to the Edges. On failover, the secondary Core changes its role to primary Core.

 Replication pair - A term used to describe the primary and secondary Core located in separate data centers and configured for FusionSync.

 Replica LUNs - LUNs at the secondary data center that are mapped to the secondary Core and kept in sync with the primary Core LUNs in the primary data center by using FusionSync.

 Journal LUN - A dedicated LUN that is mapped to the Cores from the local backend storage array. Each Core has its own Journal LUN. When FusionSync replication is suspended, the Core uses the Journal LUN to log the write operations from Edges.

 Witness - A role assigned to one of the Edges. A Witness registers requests to suspend replication from the Cores. A Witness makes sure that only one Core at a time suspends its replication. This suspended replication prevents a split-brain scenario and ensures the Edge’s write operations are logged to the Journal LUN on the Core approved by the witness. You must meet the following requirements to set up replication:

 You must configure the backend storage array for each Core that is included in the replication configuration.

 The primary data center can reach the secondary data center through the chosen interfaces.

 The secondary Core cannot have any active Edges or LUNs.

 Each replica LUN must be the same size (within 1 GiB) because its LUN counterpart is in the primary data center.

 The secondary Core must be either the same or larger specification model as the primary Core.

 If you have HA configured on the primary Cores of the data center, you must also configure it on the secondary Cores. The HA configuration must be the same in each case.

 The Edges can reach the secondary data center.

SteelFusion replication design overview

This section describes the communication and general data flow between Cores that are deployed as part of a SteelFusion Replication (FusionSync) design.

108 SteelFusion Design Guide Architecture of SteelFusion replication SteelFusion Replication (FusionSync)

In a deployment in which there is just a single data center and a Core without FusionSync, the Edge acknowledges the write operations to local hosts in the branch, temporarily saves the data in its blockstore marking it as uncommitted, and then asynchronously sends the write operations to the Core. The Core writes the data to the LUN in the data center storage array. After the Core has received an acknowledgment from the storage array, the Core then acknowledges the Edge. The Edge can then mark the relevant blockstore contents as committed. In this way, the Edge is always maintaining data consistency. To maintain data consistency between the Edge and the two data centers—with a Core in each data center and FusionSync configured—the data flow is somewhat different. In the steady state, the Edge acknowledges the write operations to local hosts, temporarily saves the data in its blockstore marking it as uncommitted, and asynchronously sends the write operations to its Core. When you configure FusionSync, the primary Core applies the write operations to backend storage and replicates the write operations to secondary Core. The data is replicated between the Cores synchronously, meaning that a write operation is acknowledged by the primary Core to the Edge only when both the local storage array and the secondary Core, along with its storage array, have acknowledged the data. The Edge marks the relevant blockstore content as committed only when the primary Core has finally acknowledged the Edge. If the primary Core loses its connection to the secondary Core, it pauses FusionSync. When FusionSync is paused, writes from the Edge to the Core are not acknowledged by the Core. The Edge continues to acknowledge the local hosts in the branch and buffer the writes similar to its behavior when the WAN connectivity to Core goes down, without FusionSync. Although write operations between Edge and Core are not available, read operations are not affected, and read requests from the Edges continue to be serviced by the same Core as normal. When the connectivity comes back up, FusionSync continues automatically. If, for any reason, the connectivity between the Cores takes a long time to recover, the uncommitted data in the Edges might continue to increase. Uncommitted data in the Edge can lead to a full blockstore. If the blockstore write reserve is in danger of reaching capacity, you can suspend FusionSync. When FusionSync is suspended, the primary Core accepts writes from the Edges, keeps a log of the write operations on its Journal LUN and acknowledges the write operations to the Edges so that the blockstore data is marked as committed. When a primary Core is down, a secondary Core can take over the primary role. You have to manually initiate the failover on the secondary Core. The Edges maintain connectivity to both Cores (primary and secondary) when the failover occurs, the surviving Core automatically contacts the Edges to move all Edge data connections to the secondary Core. At this point, the secondary Core becomes primary with its Replication suspended. Now the new primary Core acknowledges writes from Edges, applies them to the storage array, logs the operations into the Journal LUN, and acknowledges the write operations to the Edges. When the connectivity between the Cores is restored, the new primary Core starts resynchronizing writes logged in the Journal LUN through the Core in the original data center (the old primary Core) to the LUNs. In this recovery scenario, the old primary Core now becomes the secondary Core and all the LUNs protected by FusionSync are brought back into the synchronization with their replicas.

SteelFusion Design Guide 109 SteelFusion Replication (FusionSync) Architecture of SteelFusion replication

Whatever the failover scenario, when the failed data center and Core are brought back online and connectivity between data centers is restored, you can failback to the original data center by initiating a failover at the active (secondary) data center. Because a failover is only possible if the primary Core is not reachable by the secondary Core and the Witness, you must manually bring down the primary Core. You can accomplish this process by stopping the Core service on the current primary Core (in the secondary data center) and then initiating a failover on the old primary Core located in the primary data center. As with any high-availability scenario, there may be a possibility of a split-brain condition. In the case of SteelFusion Replication, it is when both the primary and secondary Cores are up and visible to Edges but cannot communicate to each other. FusionSync could become suspended on both sides and the Edges send writes to both Cores, or some writes to the Core and some to the other. Writing to both sides leads to a condition when both Cores are journaling and neither Core has a consistent copy of the data. More than likely, split brain results in a data loss. To prevent the issue, you must define one of the Edges as a Witness. The Witness must approve the request that comes to the Cores to suspend replication. The Witness makes sure that both primary and secondary Cores do not get approval for suspension at the same time. When the request is approved, the Core can start logging the writes to the Journal LUN. Figure 7-1 shows the general design of the FusionSync feature.

Figure 7-1. Replication design overview

1. A write operation is asynchronously propagated from the Edge to the primary Core (Core X).

2. The Core applies the write operation to the LUN in the backend storage and synchronously replicates the write to the secondary Core (Core X’).

3. The backend storage in Data Center A acknowledges the write to Core X.

4. The secondary Core (Core X’) applies the write to the replica LUN in its backend storage. The storage in Data Center B acknowledges the write to Core X’. Core X’ then acknowledges the write.

110 SteelFusion Design Guide Failover scenarios SteelFusion Replication (FusionSync)

5. Core X acknowledges the write to Edge.

Failover scenarios This section describes the following failover scenarios:

 “Secondary site is down” on page 111

 “Replication is suspended at the secondary site” on page 112

 “Primary site is down (suspended)” on page 113

Secondary site is down

Figure 7-2 shows the traffic flow if the secondary site goes down and FusionSync is paused but not suspended.

Figure 7-2. Secondary site is down

A write from Edges is not acknowledged by the primary Core. The Edges start to buffer the writes locally. Read operations from the Core and read and write operations in the branch office between the Edge and hosts are being performed as usual.

SteelFusion Design Guide 111 SteelFusion Replication (FusionSync) Failover scenarios

Replication is suspended at the secondary site

Figure 7-3 shows the traffic flow if FusionSync is suspended at the secondary site.

Figure 7-3. Replication Is suspended at the secondary site

1. A write operation is asynchronously propagated to the primary Core (Core X) from the Edge.

2. The Core applies the write operation to backend storage and logs the write to Journal LUN.

3. The backend storage acknowledges the write.

4. Wait for the write to be journaled.

5. The Core acknowledges the write to the Edge.

112 SteelFusion Design Guide FusionSync high-availability considerations SteelFusion Replication (FusionSync)

Primary site is down (suspended)

Figure 7-4 shows the traffic flow if the primary site (Data Center A) is down and the failover is initiated. The secondary Core assumes the primary role and the connected Edges failover to the secondary site (Data Center B).

Figure 7-4. Primary site is down

1. A write operation is asynchronously propagated from the Edge to the secondary Core (Core X’) that has assumed the primary role.

2. The Core applies the write operation to backend storage and logs the write to the Journal LUN.

3. The backend storage acknowledges the write.

4. Wait for the write to be journaled.

5. The Core acknowledges the write to the Edge.

FusionSync high-availability considerations This section describes design considerations when FusionSync is implemented in conjunction with high availability (HA). It contains the following topics:

 “FusionSync and Core HA” on page 114

 “Replication HA failover scenarios” on page 114 For information about HA, see “SteelFusion Appliance High-Availability Deployment” on page 81.

SteelFusion Design Guide 113 SteelFusion Replication (FusionSync) FusionSync high-availability considerations

FusionSync and Core HA

We strongly recommend that you set up your Cores for HA. High-availability deployment provides for redundancy at the Core level within a single data center but not across data centers. Consider FusionSync as an enhancement to HA to protect against data center failures. Do not consider FusionSync as a replacement for HA within the data center. Core HA is active-active by design with each Core separately serving different sets of LUNs to their Edges. Therefore, each Core needs FusionSync configured for those LUNs to their replication peers.

Note: When you have configured Core HA, one Core has the role of the leader and the other has the role of the follower.

Consider the following factors when configuring replication for Cores set up for HA:

 Some configuration changes are only possible on the leader Core. For example, suspend and failover operations are only allowed on the leader. If the leader is down, the follower Core assumes the role of leader.

 You need to configure the same Journal LUN on both the leader and the follower Cores.

 If the primary Core is set up for HA, the secondary Core must be configured for HA too.

 HA must be set up before configuring replication.

 Cores configured for HA must have the same replication role and data center name.

 After you have configured replication, you cannot clear the Core HA.

 When setting up the replication pair, both primary Cores in HA configuration need to be paired to their peers in the secondary data center. For example, Core X paired with Core X' and Core Y is paired with Y'.

 Terminating replication from the primary Core that is set up for HA terminates FusionSync for all four nodes.

Replication HA failover scenarios

This section describes SteelFusion Core HA failover scenarios, with FusionSync.

114 SteelFusion Design Guide FusionSync high-availability considerations SteelFusion Replication (FusionSync)

Figure 7-5 shows the Core configured for HA at Data Center A that is replicating to Data Center B. Both nodes of the active-active HA cluster are operational and replicating different LUNs to their peers.

Figure 7-5. HA replication in normal state

Figure 7-6 shows how HA failover at Data Center A is affecting replication. Core Y has failed and Core X took over responsibility for servicing Core Y LUN(s) to Edges, and Core X is replicating the LUN(s) to Core Y' at the same time as continuing to replicate to Core X’ for its own LUNs.

Figure 7-6. Core HA Failover on primary data center

SteelFusion Design Guide 115 SteelFusion Replication (FusionSync) SteelFusion replication metrics

Figure 7-7 shows how HA failover at Data Center B is affecting replication. Core Y’ has failed and Core X’ takes over responsibility of the replication target and accepting replication from Core Y and continuing to accept replication traffic from Core X.

Figure 7-7. Core HA failover on secondary data center

Figure 7-8 shows how HA failover on both sites is affecting replication. After Core Y and Core Y’ fail, Core X assumes the role of replication source and Core X’ assumes the role of replication target.

Figure 7-8. Core HA failover on primary and secondary data centers

SteelFusion replication metrics When you deploy FusionSync between Cores in two data centers, you must understand some of the traffic workload and other metrics that occur on the link between the Cores to help with sizing and troubleshooting.

116 SteelFusion Design Guide Journal LUN metrics SteelFusion Replication (FusionSync)

When FusionSync is running, the packets sent between the Cores consist of a very small header and payload. During the initial synchronization of a LUN from primary to secondary data center, the payload is fixed at 128 KB. After the initial synchronization is complete and the LUNs are active, the payload size is exactly the same as the iSCSI write that occurred at the remote site. This write is the write between the iSCSI initiator of the server and the iSCSI target of the Edge. The actual size depends on the initiator and server operating system, but the payload size can be as large as 512 KB. Whatever the payload size is between the primary and secondary, the Core honors the MTU setting of the network between data centers. The maximum replication bandwidth consumed between the two data centers is the sum of the write throughput across all locations in which there are active SteelFusion Edge appliances installed. This maximum replication bandwidth is because all active branch locations are sending committed data to the Core in the primary data center, which is then sent on to the secondary data center. To reduce the quantity of traffic between data centers, use SteelHeads to perform data streamlining on the FusionSync traffic crossing the link between the two locations. By default, each Core uses a single 1-Gbps network interface. A Core in the primary data center maintains two TCP replication connections for replication to the Core in the secondary data center. If you use multiple network interfaces on each Core, then multiple TCP connections share the available bandwidth on the link between data centers. In general, the number of connections is calculated by using the following formula:

 Total replication connections = ((2 x number of replication interfaces) x number of Cores in primary) + ((2 x number of replication interfaces) x number of Cores in secondary) For example, consider the following deployments:

 A single Core configured with one replication interface in the primary data center, and a single Core in the secondary data center, also with a single replication interface. In this scenario there would be two replication connections for the primary and two for the secondary, resulting in a total of four connections.

 Single Cores that each have two network interfaces would mean a total of (2 x 2) + (2 x 2) = 8 replication connections.

 Two Cores per data center, each with a single replication interface, means a total of (2 x 1) + (2 x 1) + (2 x 1) + (2 x 1) = 8 connections.

 A deployment with two Cores and two replication interfaces per data center has a total of (2 x 2) + (2 x 2) + (2 x 2) + (2 x 2) = 16 connections.

Journal LUN metrics FusionSync requires the use of a Journal LUN within each data center. The LUN is only used when FusionSync is suspended between data centers. This section describes some of the Journal LUN metrics so that you can correctly size, provision, and maintain the Journal LUN.

SteelFusion Design Guide 117 SteelFusion Replication (FusionSync) Related information

Regardless of the number of LUNs in the data center that are being projected by Core and protected by FusionSync, you only need one Journal LUN for each data center. The LUN only needs to keep track of the write operations, so no actual data blocks are stored on the Journal LUN. Because no storage is required, the size of the Journal LUN is quite modest compared to the size of the LUNs on the backend storage array. You can size the Journal LUN with this formula:

 1 GB for metadata + 2 GB minimum for each LUN protected by FusionSync. Therefore a SteelFusion deployment that has 25 LUNs needs a 51-GB Journal LUN. The size of the Journal LUN is checked against the number of LUNs it is configured to support for FusionSync. If the size is too small, then an alert is raised and the FusionSync replication service does not start. The Journal LUN can be thin-provisioned and, if required, it can also be dynamically increased in size. Journal LUN-error handling is fairly extensive and generally tends to include checks for loss of connectivity and offlining, but it can include events such as out of storage, corruption, and shrinking size. In these cases, an alarm is raised on the Core that alerts to various conditions. For more details on error handling, see the SteelFusion Core Management Console User’s Guide.

Related information

 SteelFusion Core Management Console User’s Guide

 SteelFusion Edge Management Console User’s Guide

 SteelFusion Command-Line Interface Reference Manual

 Riverbed Splash at https://splash.riverbed.com/community/product-lines/steelfusion

118 SteelFusion Design Guide 8

Data Protection and Snapshots

This chapter provides an overview of data protection, server-level backups, and data recovery. It also describes how Core integrates with the snapshot capabilities of storage arrays, enabling you to configure application-consistent snapshots through the Core Management Console. It includes the following sections:

 “Introduction” on page 120

 “Data protection” on page 120

 “Setting up application-consistent snapshots” on page 121

 “Configuring snapshots for LUNs” on page 122

 “Volume Snapshot Service support” on page 123

 “Implementing Riverbed Host Tools for snapshot support” on page 123

 “Configuring the proxy host for backup” on page 124

 “Configuring the storage array for proxy backup” on page 125

 “Server-level backups” on page 126

 “Data recovery” on page 131

 “Branch recovery” on page 132

 “Related information” on page 136 For details about storage qualified for native snapshot and backup support, see the SteelFusion Core Installation and Configuration Guide. For more information about deploying SteelFusion in a VMware environment using snapshots with data protection, see https://supportkb.riverbed.com/support/index?page=content&id=S27750.

Note: In general, features and settings described in this chapter apply to SteelFusion deployments in block storage (iSCSI) mode only. For deployments that are using SteelFusion in NFS/file mode, also see “Snapshots and backup” on page 194 in Chapter 13, “SteelFusion and NFS.”

SteelFusion Design Guide 119 Data Protection and Snapshots Introduction

Introduction Data protection can mean different things to different people. In the context of storage, using very simplistic terms, it is about ensuring your data is safe such that if the original data is lost in some way, another copy that you have backed up on alternative storage can be restored. This chapter aims to provide you with some general information about how SteelFusion can integrate with existing data protection strategies that you may have in place, but also offers additional tools and capabilities that could improve or even replace some of the existing approaches you may have. SteelFusion 4.6 and later provide the ability to back up data using policies that have a VM-aware capability making it even easier to keep to best practices when protecting your data. This feature is discussed further in the section called “Server-level backups” on page 126. But to begin with, the next sections cover the subject of snapshots, which form the basis of a data protection strategy.

Data protection The SteelFusion system provides tools to preserve or enhance your existing data protection strategies. If you are currently using host-based backup agents or host-based consolidated backups at the branch, you can continue to do so within the SteelFusion context. However, Core also enables a wider range of strategies, including:

 Backing up from a crash-consistent LUN snapshot at the data center The SteelFusion product family continuously synchronizes the data created at the branch with LUNs hosted in the data center. As a result, you can use the storage array at the data center to take snapshots of the LUN and thereby avoid unnecessary data transfers across the WAN. These snapshots can be protected either through the storage array replication software or by mounting the snapshot into a backup server. Such backups are only crash consistent because the storage array at the data center does not instruct the applications running on the branch server to quiesce their I/Os and flush their buffers before taking the snapshot. As a result, such a snapshot might not contain all the data written by the branch server up to the time of the snapshot.

 Backing up from an application-consistent LUN snapshot at the data center This option uses the SteelFusion Microsoft VSS integration in conjunction with Core storage array snapshot support. You can trigger VSS snapshots on the iSCSI data drive of your branch Windows servers at the branch, and the Edge ensures that all data is flushed to the data center LUN and triggers application-consistent snapshots on the data center storage array. As a result, backups are application consistent because the Microsoft VSS infrastructure has informed the applications to quiesce their I/Os before taking the snapshot. This option requires the installation of the Riverbed Host Tools on the branch Windows server. For details about Riverbed Host Tools, see “Implementing Riverbed Host Tools for snapshot support” on page 123.

 Backing up from SteelFusion-assisted consolidated snapshots at the data center This option relieves backup load on virtual servers, prevents the unnecessary transfer of backup data across the WAN, produces application-consistent backups, and backs up multiple virtual servers simultaneously over VMFS or NTFS.

120 SteelFusion Design Guide Setting up application-consistent snapshots Data Protection and Snapshots

In this option, the ESX server, and subsequently the Core, takes the snapshot, which is stored on a separately configured proxy server. The ESX server flushes the virtual machine buffers to the data stores and the Edge appliance flushes the data to the data center LUN(s), resulting in application- consistent snapshots on the data center storage array. You must separately configure the storage array information on the Core and the proxy server for backup. For details, see the “Configuring the proxy host for backup” on page 124. This option does not require the installation of the Riverbed Host Tools on the branch Windows server. For details about data protection, backup strategies, as well as a detailed discussion of crash- consistent and application-consistent snapshots, see the SteelFusion Data Protection and Recovery Guide. For a discussion of application consistency and crash consistency in general, see “Understanding crash consistency and application consistency” on page 26. For more information about deploying SteelFusion in a VMware environment, using snapshots with data protection, see https://supportkb.riverbed.com/support/index?page=content&id=S27750.

Setting up application-consistent snapshots This section describes the general process for setting up snapshots. Core integrates with the snapshot capabilities of the storage array. You can configure snapshot settings, schedules, and hosts directly through the Core Management Console or CLI. For a description of application consistency and crash consistency, see “Understanding crash consistency and application consistency” on page 26.

To set up snapshots

1. Define the storage array details for the snapshot configuration. Before you can configure snapshot schedules, application-consistent snapshots, or proxy backup servers for specific LUNs, you must specify for Core the details of the storage array, such as IP address, type, protocol, and so on. To access snapshot schedule policy configuration settings:

 In the Core Management Console, choose Configure > Data Protection: Snapshots to open the Snapshots page.

 In the Core CLI, use the storage snapshot policy modify commands.

2. Define snapshot schedule policies. You define snapshot schedules as policies that you can apply later to snapshot configurations for specific LUNs. After applied, snapshots are automatically taken based on the parameters set by the snapshot schedule policy. Snapshot schedule policies can specify weekly, daily, or day-specific schedules. Additionally, you can specify the total number of snapshots to retain. To access snapshot schedule policy configuration settings:

SteelFusion Design Guide 121 Data Protection and Snapshots Configuring snapshots for LUNs

 In the Core Management Console, choose Configure > Data Protection: Snapshots to open the Snapshots page.

 In the Core CLI, use the storage snapshot policy modify commands.

3. Define snapshot host credentials. You define snapshot host settings as storage host credentials that you can apply later to snapshot configurations for specific LUNs. To access snapshot host credential configuration settings:

 In the Core Management Console, choose Configure > Data Protection: Snapshots and select the Handoff Hosts tab.

 In the Core CLI, use the storage host-info commands. For details about CLI commands, see the SteelFusion Command-Line Interface Reference Manual. For details about using the Core Management Console, see the SteelFusion Core Management Console User’s Guide.

Configuring snapshots for LUNs This section describes the general steps for applying specific snapshot configurations to LUNs through Core. For information about configuring LUNs, see “Configuring LUNs” on page 31.

To apply snapshot configurations to a LUN

1. Select the LUN for the snapshot and access the snapshot settings. You can access the snapshot settings for a specific LUN by choosing Configure > Manage: LUNs. Select the desired LUN to display controls that include the Snapshots tab. The Snapshots tab itself has three tabs: Configuration, Scheduler, and History.

2. Apply a snapshot schedule policy to the current LUN. The controls in the Scheduler tab enable you to apply a previously configured policy to the current LUN. You can create a new schedule directly in this pane.

3. Specify the storage array where the LUN resides. The controls in the Configuration tab enable you to specify the storage array where the current LUN resides and to apply a static name that is prepended to the names of snapshots.

4. Specify the client type. The controls in the Configuration tab enable you to specify the client type. To configure application- consistent snapshots and a proxy backup, you must set this value to Windows or VMware.

5. Enable and configure application-consistent snapshots. The controls in the Configuration tab enable you to enable and configure application-consistent snapshots. The settings vary depending on which client type is selected.

122 SteelFusion Design Guide Volume Snapshot Service support Data Protection and Snapshots

Volume Snapshot Service support We support Volume Snapshot Service (VSS) through the Riverbed Hardware Snapshot Provider (RHSP) and Riverbed Snapshot Agent. For details, see “Implementing Riverbed Host Tools for snapshot support” on page 123.

Implementing Riverbed Host Tools for snapshot support Riverbed Host Tools are installed and implemented separately on the branch office Windows server. The toolkit provides the following services:

 RHSP - Functions as a snapshot provider for the VSS by exposing Core snapshot capabilities to the Windows server. RHSP ensures that users get an application-consistent snapshot.

 Riverbed Snapshot Agent - A service that enables the Edge to drive snapshots on a schedule. This schedule is set through the Core snapshot configuration. For details, see the SteelFusion Core Management Console User’s Guide. Riverbed Host Tools support 64-bit editions of Microsoft Windows Server 2008 R2 or later.

Note: In the situation where you might move a LUN from one Windows server to another and then back again to the original Windows server, you will need to restart the Riverbed Snapshot Agent service running on the original Windows server. Failure to do so will mean subsequent snapshots could fail with “ENOTCONN” error for the iSCSI connection, even though the connection is healthy. For more details, see https://supportkb.riverbed.com/support/ index?page=content&id=S29011.

This section includes the following topics:

 “Overview of RHSP and VSS” on page 123

 “Riverbed Host Tools installation and configuration” on page 124

Overview of RHSP and VSS

RHSP exposes the Edge through iSCSI to the Windows server as a snapshot provider. Only use RHSP when an iSCSI LUN is mounted on Windows through the Windows initiator. The process begins when you (or a backup script) request a snapshot through the VSS on the Windows server: 1. VSS directs all backup-aware applications to flush their I/O operations and to freeze.

2. VSS directs RHSP to take a snapshot.

3. RHSP forwards the command to the Edge.

4. The Edge exposes a snapshot to the Windows server.

5. The Edge and Core commit all pending write operations to the storage array.

SteelFusion Design Guide 123 Data Protection and Snapshots Configuring the proxy host for backup

6. The Edge takes the snapshot against the storage array.

Note: The default port through which the Windows server communicates with the Edge appliance is 4000.

Riverbed Host Tools installation and configuration

Riverbed Host Tools installation and configuration requires configuration in both Windows server and Core.

To configure Core

1. In the Core Management Console, choose Configure > Data Protection: Snapshots to configure snapshots.

2. In the Core Management Console, choose Configure > Storage Array: iSCSI, Initiators, MPIO to configure an iSCSI with the necessary storage array credentials. The credentials must reflect a user account on the storage array appliance that has permissions to take and expose snapshots. For details about both steps, see the SteelFusion Core Management Console User’s Guide.

To install and configure Riverbed Host Tools

1. Obtain the installer package from Riverbed.

2. Run the installer on your Windows server.

3. Confirm the installation as follows:

 From the Start menu, choose Run....

 At the command prompt, enter diskshadow to access the Windows DiskShadow interactive shell.

 In the DiskShadow shell, enter list providers.

 Confirm that RHSP is among the providers returned.

Configuring the proxy host for backup This section describes the general procedures for configuring the proxy host for backup in both ESXi and Windows environments.

To configure an ESXi proxy host

 Configure the ESXi proxy host to connect to the storage array using iSCSI or Fibre Channel. In deployments where vCenter is used to manage ESXi hosts in general, note the following with respect to the ESXi proxy host.

 The ESXi proxy host should not normally be managed by vCenter.

124 SteelFusion Design Guide Configuring the storage array for proxy backup Data Protection and Snapshots

 If there is a requirement to manage the ESXi proxy host with vCenter, then a different vCenter must be used than that which is managing ESXi within the Edge appliances.

 If the ESXi proxy host is managed by vCenter, then it should not be in lockdown mode.

 The Core should always be configured to communicate with the ESXi proxy host directly.

To configure a Windows proxy host

1. Configure the Windows proxy host to connect to the storage array using iSCSI or Fibre Channel.

2. Configure a local administrator user that has administrator privileges on the Windows proxy host. To create a user with administrator privileges, create the following registry setting: – HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\Policies\System –For Key Type, specify DWORD. – For Key Name, specify LocalAccountTokenFilterPolicy. – Set Key Value to 1.

3. Disable the Automount feature through the DiskPart command interpreter:

automount disable

4. Add the storage array target to the favorites list on the proxy host to ensure that the iSCSI connection is reestablished when the proxy host is rebooted.

Configuring the storage array for proxy backup This section describes the general processes for configuring Dell EqualLogic, EMC CLARiiON, EMC VNX, and NetApp storage arrays for backup. For details on these arrays and other vendors, ask your account team about relevant technical marketing solution guides.

To configure Dell EqualLogic

1. Go to the LUN and select the Access Tab.

2. Add permissions to Core initiator/IP address and assign volume-only level access.

3. Add permissions to proxy host for the LUN and assign snapshots-only access. For details, see the SteelFusion Solution Guide - SteelFusion with EqualLogic Storage Arrays.

To configure EMC CLARiiON and EMC VNX

1. Create a storage group.

2. Assign the proxy servers to the group. We recommend that you provide the storage group information to the proxy host storage group.

SteelFusion Design Guide 125 Data Protection and Snapshots Server-level backups

For details, see the SteelFusion Solution Guide - SteelFusion with EMC CLARiiON Storage Systems.

To configure NetApp

1. Create an initiator group.

2. Assign the proxy servers to the group. We recommend that you provide the storage group information to the proxy host storage group. For details, see the SteelFusion Solution Guide - SteelFusion with NetApp Storage Systems.

Server-level backups As described in the previous sections of this chapter, using snapshot technology on the LUNs deployed with SteelFusion ensures a reliable data protection model for any of the servers and virtual machines that are to be protected. However, with this LUN-centric approach to data protection, workflows for snapshots and backups can become quite complex, especially if virtual machine (VM) instances span multiple LUNs. This complexity occurs because with these types of multiple LUN configurations it is important to maintain a consistent state and coordinate timing for the snapshots of all LUNs related to a single server or VM instance. For a VM configured across multiple LUNs, snapshots for all the LUNs must be consistent to perform a successful backup. In SteelFusion Core 4.6 and later, the new Backups page and backup policy wizard provides an intuitive workflow to configure, manage, and monitor backups.

Figure 8-1. Backups page

It is still possible to configure and perform LUN-based backups as in previous releases of SteelFusion, but this new feature enables the administrator to move away from the “LUN-centric” methodologies of data protection by using the concept of server-level backups.

126 SteelFusion Design Guide Server-level backups Data Protection and Snapshots

Server-level backups still make use of snapshot technology and a proxy host in the data center, but the policy wizard simplifies the process by combining all the relevant parameters of the procedure (Edge, ESXi/Windows server details, data center storage array where the LUNs reside, and proxy server used for the backup) and creating a backup policy to meet the requirements. With this new workflow, any VM or server configuration that spans multiple LUNs, SteelFusion will ensure that all the LUN snapshots are consistent, resulting in a consistent backup. This allows policies to be created on the Core that can back up at differing levels of granularity including multiple Edges, multiple servers and VMs, a single Edge, a single server, or even a single VM (CLI only). The basic steps to create a policy on the Core include: 1. Configure snapshots for the relevant LUNs to be backed up. Remember that additional configuration may be required in secondary data centers if failover and replication are part of a Core High Availability architecture where you also want snapshots to occur.

Figure 8-2. LUNs page - configuring the snapshots

SteelFusion Design Guide 127 Data Protection and Snapshots Server-level backups

2. Configure the backup proxy.

Figure 8-3. Backup Policy Wizard - configuring the backup proxy

3. Use the Create New Backup Policy Wizard to specify the backup policy’s scope, name, schedule, and proxy server.

Figure 8-4. Backup Policy Wizard

For details on how to configure backup policies, see the “Configuring snapshots and backups” section in the “Configuring Storage” chapter of the SteelFusion Core Management Console User’s Guide.

Note: In Core 4.6 and later, the Core uses the WinRM protocol to communicate with Windows proxy servers used for backup. It is therefore important to ensure that the proxy is able to support WinRM. Earlier versions of Core used Winexe, which is no longer supported in Core 4.6 and later.

If you want to use the server-level backups feature to protect Windows servers using LUNs from an Edge, it is important that the servers are running version 4.6 of the RAgent. RAgent is included in the Unified Installer for Riverbed Plugins package, which can be downloaded from the SteelFusion Core section of the Riverbed Support site. To ensure that any new or modified policy is working as expected, see the Reports > Data Protection: Backups page to view the policy details and verify that backups have succeeded. These reports also help in troubleshooting any errors. For an overview of the backup status for all protected Edges, view the Backups pane in the Core Dashboard.

128 SteelFusion Design Guide Server-level backups Data Protection and Snapshots

Migrating from LUN protection to server-level backup

If you are upgrading to Core 4.6 or later, any existing LUN-based backup configurations will continue to work normally. However, you will not be able to modify any of the existing LUN-based backup configurations or create any new ones. To change any existing LUN-based backup configurations to server-based backups, you must migrate the configurations. Core 4.6 can support both LUN protection and server-level backup at the same time. Therefore, it is not necessary to migrate all existing LUN protection configurations for all Edges at the same time. However, each Edge in the deployment can only be configured to use LUN protection or server-based backups, not both. Migration can be achieved in a phased approach, one Edge at a time. Whether a phased migration is adopted or not, there are some key points to be aware of:

 All Cores, including HA peers, which will use server-level backups, must be running version 4.6 or later.

 Each Edge (including the HA peer if configured) to be protected with server-level backup must be running version 4.6 or later.

 During the Core upgrade to version 4.6, there is a procedure that is run to clean up the current LUN protection state on the proxy host and the storage array. This clean-up procedure does not affect existing LUN protection configurations, but it is required in order for these existing configurations and any subsequent migrations to operate successfully. Because the procedure communicates with the proxy, the proxy must remain connected to the Core during the upgrade. If the proxy is not connected, any protection operations attempted after the upgrade will fail. To resolve this problem, you must reboot the proxy host. Once the relevant Core and Edge appliances are running the correct version of SteelFusion software, the Core can be configured so that LUN protection settings are migrated to use server-level backup policies instead. To achieve this, follow these steps: 1. Add the ESXi or Windows servers on each Edge that require protection to the backup policy.

2. Associate the policy with the correct scope, which includes the relevant LUN(s) and proxy. It is important to understand that this action will automatically disable any existing LUN protection settings for that Edge. It will also clean up any LUN protection state on the storage array and the proxy host related to the Edge. Therefore make sure that all VMs and/or servers for the specific Edge are protected as required. In the unlikely event of needing to revert to a version of Core prior to 4.6, the downgrade will only be possible once all server-level backup configuration and policies have been removed. Failure to do so will prevent the downgrade and display an error in the Core Management Console, which informs the administrator that server-level backup configurations must be removed.

Protection groups and policy customization

The server-level backup policy wizard is a quick and simple way to configure server-level backup policies using best practices for data protection in a SteelFusion deployment. The wizard is designed to ensure that the correct scope (relevant LUNs and proxy) is applied for each server, whether it is a specific server or an Edge hosting multiple servers.

SteelFusion Design Guide 129 Data Protection and Snapshots Server-level backups

A protection group (Figure 8-5) is the underlying construct that Core uses to map the server, VMs, and LUNs that are being protected. It contains the information associated with the backup policy, the server that is being protected, and the proxy host used for the backup.

Figure 8-5. Protection groups

A protection group is mapped to a particular server. This mapping applies to each server instance that exists in a backup policy, whether the policy is for an individual server or for an Edge that has multiple protected VMs. You can view the protection groups from the CLI on the Core by using the following command:

amnesiac (config) # show storage backup group all Here is some sample output:

SF-Core98 (config) # show storage backup group all Total Groups: 2 Group id : HJY4Z3WwRIex1Dr1XjxKeQ Server id : lab79-server Server type : vmware_esxi Policy id : p_test1 Maximum history : 5 Protect : all (Exclude: vm3_Win2K3-32-bit,vm4_Win2K3,vm6_Windows2008) Quiesce : None

(ID) Trigger Time Protected Excluded State Fail Reason ------(a3de-8c22264fd5c4) Nov-04 15:37:42 vm2_Windows2008 *proxy_mounted* (9dee-9be8818b7856) Nov-04 15:11:56 vm2_Windows2008 done Group id : djir95ZQlaneqCx4d3AwQ Server id : lab44-server Server type : windows_server Policy id : p_test2 Maximum history : 5 Protect : all Quiesce : None

(ID) Trigger Time Protected Excluded State Fail Reason ------(8624-b3d7753ebb9f) Nov-04 16:28:18 B9EC5$CQYtNz,B9E ... *proxy_mounted* (9c31-b6651ca50685) Nov-04 16:13:33 B9EC5$CQYtNz,B9E ... done (bd0a-5b6021d68ec3) Nov-04 15:29:49 B9EC5$CQYtNz,B9E ... done

SF-Core98 (config) #

130 SteelFusion Design Guide Data recovery Data Protection and Snapshots

Occasionally, it may be necessary to customize a policy. For example, an Edge may have five VMs but it is required only to protect a selection of the VMs. Configuring a policy for the Edge using the wizard will automatically ensure all five VMs are protected on the Edge as a best practice. Once the policy is created, you can modify it using the CLI on the Core. Using our example of the Edge with five VMs, you could modify the policy so that only two of the five VMs are protected. Alternatively, a slightly different syntax to the same command could be used to exclude two of the five VMs from being protected.

amnesiac (config) # storage backup group modify membership selected members amnesiac (config) # storage backup group modify membership all exclude There are additional options for this command that allow modification of other policy parameters such as quiesce lists and backup history. For more details on the use and syntax of the CLI commands used for protection groups, see the SteelFusion Command-Line Interface Reference Manual.

Data recovery In the event your data protection strategy fails, the SteelFusion product family enables several strategies for file-level recovery. The recovery approach depends on the protection strategy you used. This section describes the following strategies:

 File recovery from Edge snapshots at the branch When snapshots are taken at the branch using Windows VSS in conjunction with RHSP, each snapshot is available to the Windows host as a separate drive. In order to recover a file, browse to the drive associated with the desired snapshot, locate the file, and restore it. For more information about RHSP, see “Implementing Riverbed Host Tools for snapshot support” on page 123.

 File recovery from the backup application catalog file at the branch When backups are taken at the branch using a backup application such as Symantec® NetBackup™ or IBM Tivoli® Storage Manager, you access and restore files directly from the backup server. We recommend that you restore the files to a different location in case you need to resort to the current files.

 Recover individual files from a data center snapshot (VMDK files) To recover individual files from a storage array snapshot of a LUN containing virtual disk (VMDK) files, present the snapshot to a VMware ESX Server, attach the VMDK to an existing VM running the same operating system (or an operating system that reads the file system used inside the VMDKs in question). You can then browse the file system to retrieve the files stored inside the VM.

 Recover individual files from a data center snapshot (individual files) To recover individual files from a storage array snapshot of a LUN containing individual files, present the snapshot to a server running the same operating system (or an operating system that reads the file system used on the LUN). You can then browse the file system to retrieve the files.

 File recovery from a backup application at the data center

SteelFusion Design Guide 131 Data Protection and Snapshots Branch recovery

You can back up snapshots taken at the data center with a backup application or through Network Data Management Protocol (NDMP) dumps. In this case, file recovery remains unchanged from the existing workflow. Use the backup application to restore the file. You can send the file to the branch office location. Alternatively, you can take the LUN offline from the branch office and inject the file directly into the LUN at the data center. However, Riverbed does not recommend this procedure because it requires taking the entire LUN down for the duration of the procedure.

 File recovery from Windows VSS at the branch You can enable Windows VSS and previous versions can also be enabled at the branch on a SteelFusion LUN no matter which main backup option you implement. When using Windows VSS, you can directly access the drive, navigate to the previous version tab, and recover deleted, damaged or overwritten files. Windows uses its default VSS software provider to back up the previous 64 versions of each file. In addition to restoring individual files to a previous version, VSS also provides the ability to restore an entire volume. Setting up this recovery strategy requires considerations too numerous to detail here. For more details about recovery strategies, see the SteelFusion Data Protection and Recovery Guide.

Branch recovery SteelFusion 3.0 and later include the branch recovery feature that allows you to define the working data set of LUNs projected by Core. During a catastrophic and irrecoverable failure, you can lose access to the working set of LUNs. Branch recovery enables proactive prepopulation of the working set when the LUNs are restored. This section includes the following topics:

 “Overview of branch recovery” on page 132

 “How branch recovery works” on page 133

 “Branch recovery configuration” on page 133

 “Branch recovery CLI configuration example” on page 134

Overview of branch recovery

The branch recovery feature in SteelFusion 3.0 and later enable you to track disk-accesses for both Windows LUNs and VMFS LUNs (hosting Windows VMs) and quickly recover from a catastrophic failure. During normal operations, the Edge caches the relevant and recently accessed user data on a working set of projected LUNs. In event of catastrophic failure in which you cannot recover the Edge, the working set of projected LUNs is also lost. With branch recovery enabled, the working set is proactively streamed to the branch when a new Edge is installed and LUNs mapped. Branch recovery ensures that after the branch is recovered, the user experience at the branch does not change.

132 SteelFusion Design Guide Branch recovery Data Protection and Snapshots

Do not confuse either regular prefetch or intelligent prepopulation with branch recovery. Branch recovery prepopulates the working set proactively, as opposed to pulling related data on access, as in the case of regular prefetch. Branch recovery is different from intelligent prepopulation because intelligent prepopulation pushes all the used blocks in a volume with no regard to the actual working set. Branch recovery is based on Event Tracing for Windows (ETW), a kernel-level facility. Riverbed supports only Windows 7, Windows 2008 R2, Windows 2012, and Windows 2012 R2. Branch recovery is supported for both physical Windows hosts and Windows VMs. You must format physical Window host LUNs with NTFS. For VMs, NTFS-formatted VMDK hosted on VMware VMFS LUNs are supported.

How branch recovery works

The following are the major components of branch recovery:

 Branch recovery agent

 Branch recovery support on Core The branch recovery agent is a service that runs in branch on a Windows host or VM. The agent uses Windows ETW-provided statistics to collect and log all disk access I/O. Periodically, the collected statistics are written to a file that is stored on the same disk for which the statistics are collected for. The file is called lru*.log and is located in the \Riverbed\BranchRecovery\ directory.

Note: The SteelFusion Turbo Boot plugin is not compatible with the branch recovery agent. For more information about the branch recovery agent, see the SteelFusion Core Management Console User’s Guide.

You must enable branch recovery support for the LUN prior to mapping LUNs to the new Edge. When VMFS LUN or a snapshot is mapped to a new Edge, the Core crawls the LUN and parses all the lru*.log files. If the files that are previously created by a branch recovery agent are found, the Core pushes the referenced blocks to the new Edge. The branch recovery agent sends the most recently accessed blocks first, sequentially for each VM. When data for a certain time frame (hour, day, week, or month) is recovered for one VM, the process moves on to the next VM in round-robin fashion, providing equal recovery resources to all VMs.

Branch recovery configuration

You must install the branch recovery agent on each VM for which you want the benefit of the branch recovery feature. You must have administrative privileges to perform the installation. After the installation, the agent starts to monitor I/O operations and records the activity into designated files. When recovering a branch from a disaster, you must enable branch recovery for the LUN. For VMFS LUNs, you can enable branch recovery for all the VMs on the LUN, or pick and choose specific VMs on which you want the feature enabled. You must disable the branch recovery feature prior to configuration changes. If you want to add or remove any VMs from the configurations, follow these steps: 1. Disable branch recovery.

2. Make changes.

SteelFusion Design Guide 133 Data Protection and Snapshots Branch recovery

3. Enable branch recovery. You can choose a start time for branch recovery. This option enables you to control bandwidth usage and to choose the best time to start the recovery when you restore branches. For example, you can choose to schedule the recovery during the night, when the least amount of bandwidth is being used. In addition, you can set a cap (that is a percentage of the total disk size) for the amount of data that is pushed per (virtual) disk. You can configure branch recovery in the Management Console and in the CLI. In the Management Console, choose Configure > Manage: LUNs, and select the Branch Recovery tab on the desired LUN. For more information about configuring branch recovery, see the SteelFusion Core Management Console User’s Guide and the SteelFusion Command-Line Interface Reference Manual.

Figure 8-6. Branch Recovery tab on the LUNs page

Branch recovery CLI configuration example

The following example shows how to use the CLI to configure branch recovery on Core. The first output shows a LUN that is not configured with branch recovery. The example then shows how to start a schedule (with output), how to configure specific VMs, how to enable branch recovery, and output for a successfully recovered LUN. The following output is from a new VMFS LUN, with branch recovery not enabled:

# show storage lun alias 200GiB branch-recovery Branch Recovery configuration : Enabled : no Status : Not Enabled Phase : Not Enabled Progress : Not Enabled Start date : Not Configured Start time : Not Configured # Use the following command to start a branch recovery schedule:

134 SteelFusion Design Guide Branch recovery Data Protection and Snapshots

# configure terminal (config)# storage lun modify alias 200GiB branch-recovery schedule start-now The output from the VMFS LUN now has a started schedule, but branch recovery remains disabled:

# show storage lun alias alias-vmfs_lun branch-recovery Branch Recovery configuration : Enabled : no Status : not_started Progress : 0 Bytes pushed Start date : 2016/03/14 Start time : 15:01:16 # The output does not list any VMs. If you have not defined them, all VMs are added by default. If you want to enable branch recovery for specific VMs on a specific LUN, use the following command:

(config) # storage lun modify alias 200GiB branch-recovery add-vm oak-sh486-vm1 Note: VM names are discovered by prefetch and available for automatic completion. The default cap is set to 10. You can change the default with the storage lun modify alias 200GiB branch-recovery add-vm oak-sh486 cap 50 command.

Use the following command to show the result of configuring specific VMs:

# show storage lun alias 200GiB branch-recovery Branch Recovery configuration : Enabled : no Status : not_started Phase : not_started Progress : 0 Bytes pushed Start date : 2016/02/20 Start time : 10:32:59 VMs : Name : oak-sh486-vm1 Status : Not Started Cap : 10 % Percent Complete: Branch recovery not started or not enabled on VM Recovering data from: Branch recovery not started or not enabled on VM # When you are done configuring, , and adding the VMs, you can enable branch recovery for the LUNs by using the following command:

(config) # storage lun modify alias 200GiB branch-recovery enable Notice that with branch recovery enabled, data blocks are actively being restored to the LUN. Use the following command to check the status of the recovery on a LUN:

# show storage lun alias 200GiB branch-recovery Branch Recovery configuration : Enabled : yes Status : started Phase : day Progress : 3729920 Bytes pushed Start date : 2016/02/20 Start time : 10:32:59 VMs : Name : oak-sh486 Status : In-progress Cap : 10 % Percent Complete : 9 % Recovering data from : Mon Feb 19 15:25 2016 #

SteelFusion Design Guide 135 Data Protection and Snapshots Related information

When the recovery of the LUN is complete, you see the following output:

# show storage lun alias 200GiB branch-recovery Branch Recovery configuration : Enabled : yes Status : complete Progress : complete Start date : 2014/02/20 Start time : 10:32:59 VMs : Name : oak-sh486-vm1 Status : Complete Cap : 10 % Percent Complete : 100 % Recovering data from : N/A

Related information

 SteelFusion Core Management Console User’s Guide

 SteelFusion Edge Management Console User’s Guide

 SteelFusion Command-Line Interface Reference Manual

 Riverbed Splash site at https://splash.riverbed.com/community/product-lines/steelfusion

 SteelFusion Solution Guide - SteelFusion with EqualLogic Storage Arrays

 SteelFusion Solution Guide - SteelFusion with NetApp Storage Systems

 SteelFusion Data Protection and Recovery Guide

136 SteelFusion Design Guide 9

Data Resilience and Security

This chapter describes security and data resilience deployment procedures and design considerations. It contains the following sections:

 “Recovering a single Core” on page 137

 “Edge replacement” on page 139

 “Disaster recovery scenarios” on page 140

 “Best practice for LUN snapshot rollback” on page 144

 “Using CHAP to secure iSCSI connectivity” on page 144

 “At-rest and in-flight data security” on page 146

 “Clearing the blockstore contents” on page 149

 “Edge network communication” on page 150

 “Additional security best practices” on page 150

 “Related information” on page 151

Note: Features and settings described in this chapter may apply to SteelFusion deployments in both block storage (iSCSI) mode and NFS/file mode. Where indicated, content is applicable to block storage mode only. For deployments that are using SteelFusion in NFS/file mode, also see Chapter 13, “SteelFusion and NFS.”

Recovering a single Core If you decide you want to deploy only a single Core, read this section to minimize downtime and data loss when recovering from a Core failure. This section includes the following topics:

 “Recovering a single physical Core” on page 138

 “Recovering a single Core-v” on page 138

Caution: We strongly recommend that you deploy Core as an HA pair so that in an event of a failure, you can seamlessly continue operations. Both physical and virtual Core HA deployments provide a fully automated failover without end-user impact. For more information about HA and SteelFusion Replication, see “SteelFusion Appliance High-Availability Deployment” on page 81 and “SteelFusion Replication (FusionSync)” on page 107.

SteelFusion Design Guide 137 Data Resilience and Security Recovering a single Core

Recovering a single physical Core

The Core internal configuration file is crucial to rebuilding your environment in the event of a failure. The possible configuration file recovery scenarios are as follows:

 Up-to-date Core configuration file is available on an external server - When you replace the failed Core with a new Core, you can import the latest configuration file to resume operations. The Edges reconnect to the Core and start replicating the new writes that were created after the Core failed. In this scenario, you do not need to perform any additional configuration and there is no data loss on the Core and the Edge. We recommend that you frequently back up the Core configuration file. For details about the backup and restore procedures for device configurations, see the SteelCentral Controller for SteelHead User’s Guide. Use the following CLI commands to export the configuration file:

enable configure terminal configuration bulk export scp://username:password@server/path/to/config Use the following CLI commands to replace the configuration file:

enable configure terminal no service enable configuration bulk import scp://username:password@server/path/to/config service enable

 Core configuration file is available but it is not up to date - If you do not regularly back up the Core configuration file, you can be missing the latest information. When you import the configuration file, you retain all data since the last export. The data written to the configuration file after the Core failure to Edges and LUNs are lost. You must manually add the components of the environment that were added after the configuration file was exported.

 No Core configuration file is available - This is the worst-case scenario. In this case you need to build a new Core and reconfigure all Edges as if they were new. All data in the Edges is invalidated, and new writes to Edge LUNs after Core failure are lost. There is no data loss at the Core. If there were applications running at the Edge that cannot handle the loss of most recent data, they need to be recovered from an application-consistent snapshot and backup from the data center. For more instruction on how to export and import the configuration file, see “Core configuration export” on page 175 and “Core in HA configuration replacement” on page 175. For general information about the configuration file, see the SteelFusion Core Management Console User’s Guide.

Recovering a single Core-v

Note: Block storage mode only.

The following recommendation will help to recover from potential failures and disasters and minimize data loss in a Core-v and Edges.

138 SteelFusion Design Guide Edge replacement Data Resilience and Security

 Configure the Core-v with iSCSI and use VMware HA - VMware HA is a component of the vSphere platform, which provides high availability for applications running in virtual machines. In the event of physical server failure, affected virtual machines are automatically restarted on other production servers. If you configure VMware HA for the Core-v, you have an automated failover for the single Core-v. You must be using iSCSI; do not use with Fibre Channel RDM disks.

 Continually back up the Core-v configuration file to an external shared storage - See the scenarios described in “Recovering a single physical Core” on page 138.

 Restore the Core-v from a VM snapshot - We strongly recommend that you do not use this procedure. The primary reason not to use this procedure is that the configuration file in the Core-v from the snapshot might not be current. If you made any configuration changes since the last VM snapshot, you can lose data if an incorrect Core configuration is suddenly applied to the existing Edge deployment. Using this procedure can also mean LUN snapshots triggered by the Edge might be lost. With SteelFusion 4.2 there is a configuration check performed when a Core-v from an old snapshot is booted up. When the Core-v tries to reconnect to an Edge that has a more recent configuration, an alarm is raised and the Core-Edge connection fails. Prior to SteelFusion 4.2, this check was not available.

Note: Core-v is not compatible with VMware Fault Tolerance (FT).

Edge replacement Note: Block storage mode only.

In the event of catastrophic failure, you might need to replace the Edge appliance and remap the LUNs. It is usually impossible to properly shut down an Edge LUN and bring it offline because the Edge wants to commit all its pending writes (for the LUN) to the Core. If the Edge has failed, and you cannot successfully bring the LUN offline, you need to manually remove the LUN. The blockstore is a part of Edge, and if you replace the Edge, the cached data on the failed blockstore is discarded. To protect the Edge against a single point of failure, consider an HA deployment of Edge. For more information, see “Edge high availability” on page 94. Use the following procedure for an Edge disaster recovery scenario in which there is an unexpected Edge or remote site failure. This procedure does not include Edge HA.

Note: We recommend that you contact Riverbed Support before performing the following procedure.

To replace the Edge

1. Schedule time that is convenient to be offline (if possible).

2. On the Core, force unmap LUNs from the failed Edge.

3. In the Core Management Console, remove the failed Edge.

4. Add replacement Edge. You can use the same Edge Identifier.

SteelFusion Design Guide 139 Data Resilience and Security Disaster recovery scenarios

5. Map LUNs back to the Edge.

Note: When the LUNs are remapped to the replacement Edge, the iSCSI LUN IDs might change. You must rescan or rediscover the LUNs on the ESXi.

You can lose data on the LUNs when writes to the Edge are not committed to the Core. In the case of minimal data loss, it is possible that you can easily recover the LUNs from a crash consistent state, such as with a filesystem check. However, this ease of recovery depends on the type of applications that were using the LUNs. If you have concerns about the data consistency, we recommend that you roll back the LUN to a latest application-consistent snapshot. For details, see “Best practice for LUN snapshot rollback” on page 144.

Disaster recovery scenarios This section describes basic SteelFusion appliance disaster scenarios, and includes general recovery recommendations. It includes the following topics:

 “SteelFusion appliance failure—failover” on page 140

 “SteelFusion appliance failure—failback” on page 142 Keep in mind the following definitions:

 Failover - to switch to a redundant computer server, storage, and network upon the failure or abnormal termination of the production server, storage, hardware component, or network.

 Failback - the process of restoring a system, component, or service previously in a state of failure back to its original, working state.

 Production site - the site in which applications, systems, and storage are originally designed and configured. Also known as the primary site.

 Disaster recovery site - the site that is set up in preparation for a disaster. Also known as the secondary site.

SteelFusion appliance failure—failover

In the case of a failure or a disaster affecting the entire site, we recommend you take the following considerations into account. The exact process depends on the storage array and other environment specifics. You must create thorough documentation of the disaster recovery plan for successful recovery implementation. We recommend that you perform regular testing so that the information in the plan is maintained and up to date. This sections includes the following topics:

 “Data center failover” on page 140

 “Branch office failover” on page 141

Data center failover In the event that an entire data center experiences failure or a disaster, you can restore the Core operations assuming you have met the following prerequisites:

 The disaster recovery site has the storage array replicated from the production site.

140 SteelFusion Design Guide Disaster recovery scenarios Data Resilience and Security

 The network infrastructure is configured on the disaster recovery site similarly to the production site, enabling the Edges to communicate with Core.

 Core and SteelHeads (or their virtual editions) at the disaster recovery site are installed, licensed, and configured similarly to the production site.

 Ideally, the Core at the disaster recovery site is configured identically to the Core on the production site. You can import the configuration file from Core at the production site to ensure that you have configured both Cores the same way. Unless the disaster recovery site is designed to be an exact replica of the production site, minor differences are inevitable: for example, the IP addresses of the Core, the storage array, and so on. We recommend that you regularly replicate the Core configuration file to the disaster recovery site and import it into the disaster recovery instance. You can script the necessary adjustments to the configuration to automate the configuration adoption process. Likewise, the configuration of SteelHeads in the disaster recovery site should reflect the latest changes to the configuration in the production site. All the relevant in-path rules must be maintained and kept up to date. There are some limitations:

 If you have different LUN IDs in the disaster recovery site than in the production site, you need to reconfigure the Core and all the Edges and deploy them as new. You must know which LUNs belong to which Edge and map them correspondingly. We recommend that you implement a naming convention.

 Even if the data from the production storage array is replicated in synchronous mode, you can assume that there is already committed data to the Edge. The data has not been sent to the production storage, or the data has not been replicated to disaster recovery site yet. This action means that a gap in data consistency can occur if, after the failover, the Edges immediately start writing to the disaster recovery Core. To prevent the data corruption, you need to configure all the LUNs at Edges as new. When you configure Edges as new, this configuration empties out their blockstore, causing data loss of all the writes occurred after a disaster at the production site. To prevent data loss, we recommend that you configure FusionSync. For more information, see “SteelFusion Replication (FusionSync)” on page 107.

 If you want data consistency on the application level, we recommend that you perform a rollback to one of the previous snapshots. For details, see “Best practice for LUN snapshot rollback” on page 144.

 Keep in mind that initially after the recovery, the blockstore on Edges does not have any data in the cache.

Branch office failover When the Edge in a branch becomes inaccessible from outside the branch office due to a network outage, the operation in the branch office might continue. SteelFusion products are designed with disconnected operations resiliency in mind. If your workflow enables branch office users to operate independently for a period of time (which is defined during the network planning stage and implemented with a correctly sized appliance), the branch office continues as operational and synchronizes with the data center later.

SteelFusion Design Guide 141 Data Resilience and Security Disaster recovery scenarios

In the case when the branch office is completely lost, or it is imperative for the business to have a service in the branch office online sooner, you can choose to deploy the Edge in another branch or in the data center. If you chose to deploy an Edge in the data center, we recommend that you remove the LUNs from Core as to prevent data corruption by multiple write access to the LUNs. We recommend that you roll back to a latest application-consistent snapshot. If mostly read access is required to the data projected to the branch office, a good alternative is to temporarily mount a snapshot to a local host. This snapshot enables the data to be accessible to the data center, while the branch office is operating in disconnected-operation mode. Avoiding the failover will also simplify fallback to the production site. If you chose to deploy Edge in another branch office, follow the steps in “Edge replacement” on page 139. You must understand that in this scenario, all the uncommitted writes at the branch are not stored. We recommend that you to roll back the LUNs to the latest application-consistent snapshot.

SteelFusion appliance failure—failback

After a disaster is over, or a failure is fixed, you might need to revert the changes and move the data and computing resources to where they were located before the disaster, while ensuring that the data integrity is not compromised. This process is called failback. Unlike the failover process that can occur in a rush, you can thoroughly plan and test the failback process. This section includes the following topics:

 “Data center failback” on page 142

 “Branch office failback” on page 143

Data center failback As SteelFusion relies on primary storage to keep the data intact, the Core failback can only follow a successful storage array replication from the disaster recovery site back to the production site. There are multiple ways to perform the recovery; however, we recommend that you use the following method. The process most likely requires a downtime, which you can schedule in advance. We also recommend that you create an application-consistent snapshot and backup prior to performing the following procedure. Perform these steps on one appliance at a time.

To perform the Core failback process

1. Shut down hosts and unmount LUNs.

2. Export the configuration file from the Core at the disaster recovery site.

3. From the Core, initiate taking the Edge offline. This process forces the Edge to replicate all the committed writes to the Core.

4. Remove iSCSI Initiator access from the LUN at the Core. Data cannot be written to the LUN until the data has become available after the failback completes.

142 SteelFusion Design Guide Disaster recovery scenarios Data Resilience and Security

5. Make sure that you replicate the LUN with the storage array from the disaster recovery site back to the production site.

6. On a storage array at the production site, make the replicated LUN the primary LUN. Depending on a storage array, you might need to create a snapshot, clone, or promote the clone to a LUN—or all the above. For more information, see the user guide for your storage array. The preferred method is the method that preserves the LUN ID, which might not work for all the arrays. If the LUN ID is going to change, you need to add the LUN as new on first the Core and then on the Edge.

7. If you had to make changes on the disaster recovery site due to a LUN ID change, import the Core configuration file from the disaster recovery site and make the necessary adjustments to IP addresses and so on.

8. Add access to the LUN for Core. If the LUN ID remained the same, the Core at production site begins servicing the LUN instantaneously.

9. At the branch office, check to see if you need to change the Core IP address.

Branch office failback The branch office failback process is similar to the Edge replacement process. The procedure requires downtime that you can schedule in advance. If the production LUNs were mapped to another Edge, use the following procedure.

To perform the Edge failback process

1. Shut down hosts and unmount LUNs.

2. Take the LUNs offline from the disaster recovery Edge. This process forces the Edge to replicate all the committed writes to Core.

3. If any changes were made to the LAN mapping configuration, you need to merge the changes during the fallback process. If you need assistance with this process, contact Riverbed Support.

4. Shut down the Edge at the disaster recovery site.

5. Bring up the Edge at the production site.

6. Follow the steps described in “Edge replacement” on page 139. Keep in mind that after the fallback process is completed, the blockstore on Edges does not have any data in the cache. If you took out the production LUNs of the Core and used them locally in the data center, shut down hosts and unmount LUNs and then continue the setup process as described in the SteelFusion Core Installation and Configuration Guide.

SteelFusion Design Guide 143 Data Resilience and Security Best practice for LUN snapshot rollback

Best practice for LUN snapshot rollback Note: Block storage mode only.

When single file restore is impossible or impractical, you can roll back the entire LUN snapshot on the storage array at the data center and projected out to the branch. We recommend the following procedure for a LUN snapshot rollback.

Note: A single file restore is to recover your deleted file from a backup or a snapshot without rolling back the entire file system to a point of time in which a file still existed in the file system. When you use the LUN rollback, everything that was written to (and deleted from) the file system is lost.

To roll back the LUN snapshot

1. Set the LUN offline at the server running at the Edge.

2. Remove iSCSI initiator access from the LUN at the Core.

3. Remove the LUN from the Core.

4. Restore the LUN on the storage array from a snapshot.

5. Add the LUN to the Core.

6. Add iSCSI initiator access for the LUN at Core. You can now access the LUN snapshot from a server on the Edge. Keep in mind that after this process is completed, the blockstore on Edges does not have any data in the cache.

Using CHAP to secure iSCSI connectivity Note: Block storage mode only.

Challenge-Handshake Authentication Protocol (CHAP) is a convenient and well-known security mechanism that can be used with iSCSI configurations. This section provides an overview with an example configuration. It contains the following topics:

 “One-way CHAP” on page 145

 “Mutual CHAP” on page 145 Both types of CHAP are supported on Core and Edge. For more details about configuring CHAP on either Core or Edge, see the corresponding Management Console user’s guide. Within an iSCSI deployment both initiator and target have their own passwords. In CHAP terminology these are called secrets. These passwords are shared between initiator and target in order for them to authenticate with each other.

144 SteelFusion Design Guide Using CHAP to secure iSCSI connectivity Data Resilience and Security

One-way CHAP

With one-way CHAP, the iSCSI target (server) authenticates the iSCSI initiator (client). This process is analogous to logging in to a website. The Initiator needs to provide a username and secret when logging in to the target. The username is usually the IQN (but can be any free-form string) and the password is the target secret.

To configure one-way CHAP in a Core deployment

1. Configure a target secret on the backend storage array portal.

2. Log in to the Core Management Console.

3. Add a CHAP User on the Core. The username is something descriptive or even the IQN of the Core. For example, username=cuser2. The password is the target secret configured on the backend array.

4. Select the CHAP User (Figure 9-1). When the iSCSI initiator on the Core connects to the backend storage array, it uses the credentials from the CHAP user that was created.

Figure 9-1. iSCSI portal configuration for one-way CHAP

CHAP credentials are created and stored separately. They are then used when the Core initiates an iSCSI session and logs in to the storage array portal.

Mutual CHAP

The difference between one-way CHAP and mutual CHAP is that the iSCSI target authenticates the iSCSI initiator and additionally the iSCSI initiator also authenticates the iSCSI target. Mutual CHAP incorporates two separate sequences. The first sequence is the iSCSI target authenticating the iSCSI initiator and is the exact same procedure as for one-way CHAP. The second sequence is the initiator authenticating the target, which is the reverse of the previous authentication procedure.

To configure mutual CHAP in a Core deployment

1. Configure an initiator CHAP User on the Core Management Console. For example: username = cuser1 and password = abcd1234

SteelFusion Design Guide 145 Data Resilience and Security At-rest and in-flight data security

2. Select the Enable Mutual CHAP Authentication setting on the Core and chooses cuser1 from the drop-down menu (Figure 9-2). The Core now requires all iSCSI targets to specify the password (or secret) abcd1234 before the target is trusted by the Core.

Figure 9-2. iSCSI initiator configuration for mutual CHAP

3. On the backend storage array, add the CHAP user details from the Core. In this example, the storage array CHAP user has username=cuser1 and password=abcd1234. The target now knows the secret (username and password) of the initiator.

4. On the backend storage array, configure a target CHAP user. For example: username = cuser2 and password = wxyz5678

5. Log in to the Core Management Console and add the target CHAP User on the Core. In this example: username = cuser2 and password = wxyz5678 Mutual CHAP configuration is now complete. When adding the portal of the backend storage array to the Core configuration, select the target CHAP user (cuser2). When the iSCSI initiator of the Core connects to the iSCSI target of the backend storage array, it uses the credentials from the CHAP user (cuser2) that you created. Because of mutual CHAP, the iSCSI target uses the credentials cuser1/abcd1234 to connect to the iSCSI initiator of the Core.

At-rest and in-flight data security For organizations that require high levels of security or face stringent compliance requirements, Edge provides data at-rest and in-flight encryption capabilities for the data blocks written on the blockstore cache. This section includes the following topics:

 “Enable data at-rest blockstore encryption” on page 147

 “Enable data in-flight secure peering encryption” on page 149

146 SteelFusion Design Guide At-rest and in-flight data security Data Resilience and Security

Supported encryption standards include AES-128, AES-192, and AES-256. The keys are maintained in an encrypted secure vault. In 2003, the United States government declared a review of the three algorithm key lengths to see if they were sufficient for protection of classified information up to the secret level. Top secret information requires 192-bit or 256-bit keys. The vault is encrypted by AES with a 256-bit key and a 16-byte cipher, and you must unlock it before the blockstore is available. The secure vault password is verified upon every power up of the appliance, assuring that the data is confidential in case the Edge is lost or stolen. Initially, the secure vault has a default password known only to the RiOS software so the Edge can automatically unlock the vault during system startup. You can change the password so that the Edge does not automatically unlock the secure vault during system startup and the blockstore is not available until you enter the password. When the system boots, the contents of the vault are read into memory, decrypted, and mounted (through EncFS, a FUSE-based cryptographic file system). Because this information is only in memory, when an appliance is rebooted or powered off, the information is no longer available and the in-memory object disappears. Decrypted vault contents are never persisted on disk storage. We recommend that you keep your secure vault password safe. Your private keys cannot be compromised, so there is no password recovery. In the event of a lost password, you can reset the secure vault only after erasing all the information within the secure vault.

To reset a lost password

 From either Edge appliance, enter the following CLI commands:

> enable # configure terminal (conf)# secure-vault clear When you use the secure-vault clear command, you lose the data in the blockstore if it was encrypted. You then need to reload or regenerate the certificates and private keys.

Note: The Edge blockstore encryption is the same mechanism that is used in the RiOS data store encryption. For more information, see the security information in the SteelHead Deployment Guide.

Configuring data encryption requires extra CPU resources and might affect performance. We recommend blockstore encryption only if you require a high level of security or dictated by compliance requirements.

Enable data at-rest blockstore encryption

The following example shows how to configure blockstore encryption on an Edge. The commands are entered on the Core at the data center.

To configure blockstore encryption on the Edge

1. From the Core, enter the following commands:

> enable # configure (config) # edge id blockstore enc-type

2. To verify whether encryption has been enabled on the Edge, enter the following commands:

SteelFusion Design Guide 147 Data Resilience and Security At-rest and in-flight data security

> enable # show edge id blockstore Write Reserve : 10% Encryption type : AES_256 You can do the same procedure in the Core Management Console by choosing Configure > Manage: SteelFusion Edges.

Figure 9-3. Adding blockstore encryption

148 SteelFusion Design Guide Clearing the blockstore contents Data Resilience and Security

To verify whether encryption is enabled on your Edge appliance, look at the Blockstore Encryption field on your Edge status window as shown in Figure 9-4.

Figure 9-4. Verify blockstore encryption

Enable data in-flight secure peering encryption

SteelFusion Rdisk protocol operates on clear text and there is a possibility that remote branch data can be exposed to hackers during transfer over the WAN. To counter this exposure, the Edge provides data in-flight encryption capabilities when the data blocks are asynchronously propagated to the data center LUN. You can use secure peering between the Edge and the data center SteelHead to create a secure SSL channel and protect the data in-flight over the WAN. For more information about security and SSL, see the SteelHead Deployment Guide and the SteelHead Deployment Guide - Protocols.

Clearing the blockstore contents Under normal conditions, if you select Offline on the Core for a particular LUN, the contents of the blockstore on the corresponding Edge is synchronized and then cleared.

SteelFusion Design Guide 149 Data Resilience and Security Edge network communication

However, there can be a situation in which it is necessary to make sure the entire contents of the blockstore on an Edge is erased to a military grade level. While you can achieve this level of deletion, it involves the use of commands not normally available for general use. To ensure the correct procedures are followed, open a support case with Riverbed Support.

Edge network communication In a location in which you have deployed Edge, there can be a requirement to keep track of the ports and protocols used with the various interfaces that are active. The following table provides you with a general list of devices and ports, including a description of what the communication is related to.

Source Device Source Destination Destination Protocol Description port device port

Edge primary interface Any Core 7950-7955 TCP BlockStream Edge primary interface Any Core 7970 TCP SteelFusion management SCC and other Any Edge primary 22,80,443 TCP Edge management management hosts interface

VSP management Any Through the Edge ESXi management hosts primary interface (using virtual IP) VSP management Any Through Edge 3389 TCP RDP to remote VM hosts primary interface (using VM management IP, vm_pri) vSphere client Any Through the Edge 22, 80, 443, TCP, UDP ESXi management machine primary interface 902, 903, 9443 (using virtual IP) Edge primary interface Any vSphere client 22, 80, 443, TCP, UDP VM management (using VM machine 902, 903 management IP, vm_pri) Edge in-path interface Any Core Any TCP, UDP, WAN optimization ICMP

Additional security best practices For additional advice and guidance on appliance security in general, see the SteelHead Deployment Guide. The guide includes suggestions on restricting web-based access, the use of role-based accounts, creation of login banners, alarm settings, and so on, which you can apply in principle to SteelFusion Edge appliances.

150 SteelFusion Design Guide Related information Data Resilience and Security

Related information

 SteelFusion Core Management Console User’s Guide

 SteelFusion Edge Management Console User’s Guide

 SteelFusion Core Installation and Configuration Guide

 SteelFusion Command-Line Interface Reference Manual

 SteelHead Deployment Guide

 Riverbed Splash at https://splash.riverbed.com/community/product-lines/steelfusion

SteelFusion Design Guide 151 Data Resilience and Security Related information

152 SteelFusion Design Guide 10

SteelFusion Appliance Upgrade

This chapter provides some general guidance when upgrading your SteelFusion appliances. It includes the following topics:

 “Planning software upgrades” on page 153

 “Upgrade sequence” on page 154

 “Minimize risk during upgrading” on page 154

 “Performing the upgrade” on page 155

 “Related information” on page 156

Note: Features and settings described in this chapter may apply to SteelFusion deployments in both block storage (iSCSI) mode and NFS/file mode. Where indicated, content is applicable to block storage mode only. For deployments that are using SteelFusion in NFS/file mode, also see Chapter 13, “SteelFusion and NFS.”

Planning software upgrades Before you perform a software upgrade to a SteelFusion deployment, there are a few steps to consider. This section describes best practices that you can incorporate into your own upgrade and change control procedures. For detailed instructions and guidance on upgrading each of the products, see the SteelFusion Edge Installation and Configuration Guide and the SteelFusion Core Installation and Configuration Guide. Prior to upgrading, complete the following prerequisites:

 Alert users - Depending on your deployment you might have a full-HA SteelFusion configuration at the data center and at the branch office. This configuration allows you to perform software upgrades with minimal or zero disruption to your production environment. Whether this is your case or not, you should schedule either downtime or an at risk period so that your users are aware of any possible interruption to service.

 Alert IT staff - Because you might also be using your Edge appliances simultaneously for WAN optimization services, you should alert other IT departments within your organization: for example, networking, monitoring, applications, and so on.

 Software - Gather all the relevant software images from the Riverbed Support site and consider using the SCC to store the images and assist with the upgrade. When downloading the software images, make sure to download the release notes so that you are aware of any warnings, special instructions, or known problems that can affect your upgrade plans.

SteelFusion Design Guide 153 SteelFusion Appliance Upgrade Upgrade sequence

 Configuration data - Ensure all your Cores and Edges have their current running configurations saved to a suitable location external to the appliances themselves. You can use the SCC to assist with this task with the Core.

Upgrade sequence If you are planning to upgrade both Core and Edge as part of the same procedure, then the normal sequence is to upgrade the Core appliances and then upgrade the Edges.

Note: If you are only upgrading Core or Edge, but not both, this section does not apply.

If there is HA at the Edge and no HA at the Core, the sequence is still the same—Core first followed by Edge with the standby Edges preceding active Edge upgrades. The following table summarizes the sequence.

Upgrade phases Deployment First Second

Core - Edge Core All Edges owned by the Core. Core HA - Edge Core HA All Edges owned by the Core HA. Core - Edge HA Core All Edges are owned by the Core. Upgrade the standby Edge first, and wait for it to be synchronized with the active Edge. Next, upgrade the active Edge. Core HA - Edge HA Core HA All Edges are owned by the Core HA. Upgrade standby Edges before upgrading active Edges. If you have an HA deployment, it is possible to have mixed software versions between HA peers for a short period of time. You can also run with mismatched software versions between Core and Edge for short periods of time; however, we do not recommend this practice. In Core and Edge 4.2 or later, there are restrictions in place to enforce supported upgrade and downgrade paths. You can upgrade or downgrade the Core or Edge software up to a maximum of two versions beyond the current version. If you attempt to upgrade or downgrade to an unsupported version, a warning opens and the upgrade or downgrade does not proceed. If there are any doubts about any of these procedures, contact Riverbed Support.

Minimize risk during upgrading Although it is expected that the upgrade procedure will progress and complete without any problems, you should have a contingency plan to back out or restore the previous state of operations.

154 SteelFusion Design Guide Performing the upgrade SteelFusion Appliance Upgrade

Both Core and Edge upgrade procedures automatically install the new software image into a backup partition on the appliance. The existing (running) software image is in a separate (active) partition. During the reboot, which is part of the upgrade procedure, the roles of the backup and active partitions are reversed. This action ensures that if you require a downgrade to restore the previous software version, a partition swap and reboot are all that should be required. If you have a lab or development environment in which some nonproduction SteelFusion appliances are in use, consider doing a trial upgrade. This upgrade ensures you have some exposure to the upgrade processes, enables you to measure the time taken to perform the tasks, and gain other useful experience. You can complete the trial upgrade well ahead of the production upgrade to confirm the new version of software operates as expected.

Performing the upgrade

This section describes the tasks involved to upgrade your SteelFusion appliances. It contains the following sections:

 “Edge upgrade” on page 155

 “Core upgrade” on page 156

Once you are ready (and if there is no HA configuration for the Core) start by upgrading the Edge appliances first. After these appliances are successfully upgraded, proceed to upgrade the Cores. If you have Core deployed in an HA deployment, upgrade the Cores first, followed by the Edge appliances.

For the proper sequence, see “Upgrade sequence” on page 154.

Edge upgrade

Edge software and functionality is incorporated into the SteelFusion Edge appliance software image. When performing the upgrade there is a reboot of the appliance, which causes an interruption or degradation of service both to Edge and WAN optimization (if there is no HA). While you do not need to disconnect the Edge from the Core, you should stop all read and write operations for any VSP-hosted services and any external application servers that are using the Edge for storage connectivity. Preferably, shut down the servers, and in the case of VSP, place the ESXi instance into maintenance mode. In the case of Edge HA deployments, upgrade one of the Edge peers first, leaving the other Edge in a normal operating state. During the upgrade process the surviving Edge enters a degraded state. This state is expected behavior. After the upgrade of the first Edge in the HA configuration is complete, check that the two Edge HA peers rediscover each other before proceeding with the upgrade of the second Edge.

SteelFusion Design Guide 155 SteelFusion Appliance Upgrade Related information

Core upgrade

Before upgrading the Core, ensure that any data written by Edge to LUNs projected by the Core is synchronized to the LUNs in the data center storage array. In addition, take a snapshot of any LUNs prior to the upgrade. If a Core is part of an HA configuration with a second Core, then you must upgrade both before the Edges that the Cores are responsible for are also upgraded. You can choose which Core you begin with because the HA configuration is active-active. In either case, the upgrade triggers a failover when the first Core is rebooted with the new software version, followed by a failback after the reboot is complete. The same process occurs with the second Core. Therefore, during the Core HA upgrade there are two separate instances of failover followed by failback. Whichever Core is upgraded first, continue to upgrade the second Core of the HA pair before upgrading the Edges. When upgrading a Core that is not part of an HA configuration, there is an interruption to service for the projected LUNs to the Edges. You do not need to disconnect the Edge appliances from the Core, nor do you need to unmap any required LUNs managed by the Core from the storage array. When upgrading a Core that is part of an HA configuration, the peer Core appliance triggers an HA failover event. This failover is expected behavior. After the upgrade of the first Core is complete, check to ensure that the two Core HA peers have rediscovered each other and that both are in ActiveSelf state before upgrading the second Core.

Related information

 SteelFusion Core Management Console User’s Guide

 SteelFusion Edge Management Console User’s Guide

 SteelFusion Core Installation and Configuration Guide

 SteelFusion Command-Line Interface Reference Manual

 Riverbed Splash at https://splash.riverbed.com/community/product-lines/steelfusion

156 SteelFusion Design Guide 11

Network Quality of Service

SteelFusion technology enables remote branch offices to use storage provisioned at the data center through unreliable, low-bandwidth and high-latency WAN links. Adding this new type of traffic to the WAN links creates new considerations in terms of guaranteeing quality of service (QoS) to existing WAN applications and to the SteelFusion Rdisk protocol to function at the best possible level. This chapter contains the following topics:

 “Rdisk protocol overview” on page 157

 “QoS for SteelFusion replication traffic” on page 159

 “QoS for LUNs” on page 159

 “QoS for branch offices” on page 159

 “Time-based QoS rules example” on page 160

 “Related information” on page 160 For general information about QoS, see the SteelHead Deployment Guide.

Note: Features and settings described in this chapter may apply to SteelFusion deployments in both block storage (iSCSI) mode and NFS/file mode. Where indicated, content is applicable to block storage mode only. For deployments that are using SteelFusion in NFS/file mode, also Chapter 13, “SteelFusion and NFS.”

Rdisk protocol overview To understand the QoS requirements for the SteelFusion Rdisk protocol, you must understand how it works. The Rdisk protocol defines how the Edge and Cores communicate and how they transfer data blocks over the WAN. Rdisk uses five TCP ports for data transfers and one TCP port for management.

SteelFusion Design Guide 157 Network Quality of Service Rdisk protocol overview

The following table summarizes the TCP ports used by the Rdisk protocol. It maps the different Rdisk operations to each TCP port.

TCP Port Operation Description

7970 Management Manages information exchange between Edge and Core. The majority of the data flows from the Core to the Edge. 7950 Read Transfers data requests for data-blocks absent in Edge from the data center. The majority of the data flows from the Edge to the Core. 7951 Write Transfers new data created at the Edge to the data center and snapshot operations. The majority of the data flows from the Edge to the Core. 7952 Prefetch0 Prefetches data for which SteelFusion has the highest confidence (for example, file Read Ahead). The majority of the data flows from the Core to the Edge. 7953 Prefetch1 Prefetches data for which SteelFusion has medium confidence (for example, Boot). The majority of the data flows from the Core to the Edge. 7954 Prefetch2 Prefetches data for which SteelFusion has the lowest confidence (for example, Prepop). The majority of the data flows from the Core to the Edge.

Note: Rdisk Protocol creates five TCP connections per exported LUN.

Different Rdisk operations use different TCP ports. The following table summarizes the Rdisk QoS requirements for each Rdisk operation and its respective TCP port.

TCP Port Operation Outgoing Outgoing branch Outgoing data center Outgoing data branch office office priority bandwidth center priority bandwidth

7970 Management Low Normal Low Normal 7950 Read Low Business critical High Business critical 7951 Write High (off-peak Low priority Low Normal hours) Low (during peak hours)

7952 Prefetch0 Low Business critical High Business critical 7953 Prefetch1 Low Business critical Medium Business critical 7954 Prefetch2 Low Business critical High Best effort

For more information about Rdisk, see “Rdisk traffic routing options” on page 165.

158 SteelFusion Design Guide QoS for SteelFusion replication traffic Network Quality of Service

QoS for SteelFusion replication traffic To prevent SteelFusion replication traffic from consuming bandwidth required for other applications during business hours, you should allow more bandwidth for Rdisk write traffic (port 7951) during the off-peak hours and less bandwidth during the peak hours. Also carefully consider your recovery point objectives (RPO) and recovering time objectives (RTO) when configuring QoS for Rdisk SteelFusion traffic. Depending on which SteelFusion features you use, you might need to consider different priorities and bandwidth requirements.

QoS for LUNs This section contains the following topics:

 “QoS for unpinned LUNs” on page 159

 “QoS for pinned LUNs” on page 159 For more information about pinned LUNs, see “Pin the LUN and prepopulate the blockstore” on page 162 and “When to pin and prepopulate the LUN” on page 174.

QoS for unpinned LUNs

In an unpinned LUNs scenario, you should prioritize traffic on port 7950 so that the SCSI read requests for data blocks not present on the Edge blockstore cache can arrive from the data center LUN in a timely manner. You should prioritize traffic on ports 7952, 7953, and 7954 so that the prefetch data can arrive at the blockstore when needed.

QoS for pinned LUNs

In a pinned, prepopulated LUN scenario, all the data is present at the Edge. You should prioritize only port 7951 so that the Rdisk protocol can transfer newly written data blocks from the Edge blockstore to the data center LUN through Core.

QoS for branch offices This section contains the following topics:

 “QoS for branch offices that mainly read data from the data center” on page 159

 “QoS for branch offices booting virtual machines from the data center” on page 160

QoS for branch offices that mainly read data from the data center

In the case of branch office users who are not producing new data but instead are mainly reading data from the data center and the LUNs are not pinned, you should prioritize traffic on ports 7950 and 7952 so that the iSCSI read requests for data blocks not present on the Edge blockstore cache can arrive from the data center LUN in a timely manner.

SteelFusion Design Guide 159 Network Quality of Service Time-based QoS rules example

QoS for branch offices booting virtual machines from the data center

In the case of branch office users who are booting virtual machines from the data center and the LUNs are not pinned, ensure that port 7950 is the top priority for nonpinned LUN and that you prioritize traffic on port 7953 so that boot data is prefetched on this port in a timely manner.

Time-based QoS rules example This example illustrates how to configure time-based QoS rules on a SteelHead. You want to create two recurring jobs, each undoing the other, using the standard job CLI command. One sets the daytime cap on throughput or a low minimum guarantee, and the other then removes that cap or sets a higher minimum guarantee.

steelhead (config) # job 1 date-time hh:mm:ss year/month/day "Start time" steelhead (config) # job 1 recurring 864000 "Occurs once a day" steelhead (config) # job 1 command 1 steelhead (config) # job 1 command 2 "Commands to set daytime cap" steelhead (config) # job 1 enable

steelhead (config) # job 2 date-time hh:mm:ss year/month/day "Start time" steelhead (config) # job 2 recurring 864000 "Occurs once a day" steelhead (config) # job 2 command 1 steelhead (config) # job 2 command 2 "Commands to remove daytime cap" steelhead (config) # job 2 enable

Related information

 SteelFusion Core Management Console User’s Guide

 SteelFusion Edge Management Console User’s Guide

 Riverbed Command-Line Interface Reference Manual

 SteelHead Deployment Guide (for general information about QoS)

 Riverbed Splash at https://splash.riverbed.com/community/product-lines/steelfusion

160 SteelFusion Design Guide 12

Deployment Best Practices

Every deployment of the SteelFusion product family differs due to variations in specific customer needs and types and sizes of IT infrastructure. The following recommendations and best practices are intended to guide you to achieving optimal performance while reducing configuration and maintenance requirements. However, these guidelines are general; for detailed worksheets for proper sizing, contact your Riverbed account team.

Note: For additional advice and best practices, see the SteelFusion Best Practices Guide in the SteelFusion documents section of the Riverbed Splash site, https://splash.riverbed.com/community/product-lines/ steelfusion.

This chapter includes the following sections:

 “Edge best practices” on page 161

 “Core best practices” on page 172

 “iSCSI initiators timeouts” on page 176

 “Operating system patching” on page 176

 “Related information” on page 177

Note: Features and settings described in this chapter may apply to SteelFusion deployments in both block storage (iSCSI) mode and NFS/file mode. Where indicated, content is applicable to block storage mode only. For deployments using SteelFusion in NFS/file mode, see Chapter 13, “SteelFusion and NFS.”

Edge best practices This section describes best practices for deploying the Edge. It includes the following topics:

 “Segregate traffic” on page 162

 “Pin the LUN and prepopulate the blockstore” on page 162

 “Segregate data onto multiple LUNs” on page 162

 “Ports and type of traffic” on page 163

 “iSCSI port bindings” on page 163

 “Changing IP addresses on the Edge, ESXi host, and servers” on page 163

 “Disk management” on page 164

 “Rdisk traffic routing options” on page 165

SteelFusion Design Guide 161 Deployment Best Practices Edge best practices

 “Deploying SteelFusion with third-party traffic optimization” on page 165

 “Windows and ESX server storage layout—SteelFusion-protected LUNs vs. local LUNs” on page 166

 “VMFS datastores deployment on SteelFusion LUNs” on page 168

 “Enable Windows persistent bindings for mounted iSCSI LUNs” on page 169

 “Set up memory reservation for VMs running on VMware ESXi in the VSP” on page 170

 “Boot from an unpinned iSCSI LUN” on page 171

 “Running antivirus software” on page 171

 “Running disk defragmentation software” on page 171

 “Running backup software” on page 171

 “Configure jumbo frames” on page 172

 “Removing Core from Edge and re-adding Core” on page 172

Segregate traffic

At the remote branch office, separate storage iSCSI traffic and WAN/Rdisk traffic from LAN traffic. This practice helps to increase overall security, minimize congestion, minimize latency, and simplify the overall configuration of your storage infrastructure.

Pin the LUN and prepopulate the blockstore

Note: Block storage mode only.

In specific circumstances, you should pin the LUN and prepopulate the blockstore. Additionally, you can have the write-reserve space resized accordingly; by default, the Edge has a write-reserve space that is 10 percent of the blockstore. To resize the write-reserve space, contact your Riverbed representative. We recommend that you pin the LUN in the following circumstances:

 Unoptimized file systems - Core supports intelligent prefetch optimization on NFTS and VMFS file systems. For unoptimized file systems such as FAT, FAT32, ext3, and others. Core cannot perform optimization techniques such as prediction and prefetch in the same way as it does for NTFS and VMFS. For best results, pin the LUN and prepopulate the blockstore.

 Database applications - If the LUN contains database applications that use raw disk file formats or proprietary file systems, pin the LUN and prepopulate the blockstore.

 WAN outages are likely or common - Ordinary operation of SteelFusion depends on WAN connectivity between the branch office and the data center. If WAN outages are likely or common, pin the LUN and prepopulate the blockstore.

Segregate data onto multiple LUNs

Note: Block storage mode only.

162 SteelFusion Design Guide Edge best practices Deployment Best Practices

We recommend that you separate storage into three LUNs, as follows:

 Operating system - In case of recovery, the operating system LUN can be quickly restored from the Windows installation disk or ESX datastore, depending on the type of server used in the deployment.

 Production data - The production data LUN is hosted on the Edge and therefore safely backed up at the data center.

 Swap space - Data on the swap space LUN is transient and therefore not required in disaster recovery. We recommend that you use this LUN as an Edge local LUN.

Ports and type of traffic

Note: Block storage mode only.

You should only allow iSCSI traffic on primary and auxiliary interfaces. Riverbed does not recommend that you configure your external iSCSI Initiators to use the IP address configured on the in-path interface. Some appliance models can optionally support an additional NIC to provide extra network interfaces. You can also configure these interfaces to provide iSCSI connectivity.

iSCSI port bindings

Note: Block storage mode only.

If iSCSI port bindings are enabled on the on-board hypervisor of the Edge appliance, a Fusion Edge HA failover operation can take too long and time out. iSCSI port bindings are disabled by default. If these port bindings are enabled, we recommend that you remove the port bindings because the internal interconnect interfaces are all on different network segments. For more information on this scenario, see the Riverbed Knowledge Base article After Fusion Edge HA failover, ESX (VSP) takes several minutes to re-establish connectivity to LUNs at https:// supportkb.riverbed.com/support/index?page=content&id=S28205.

Changing IP addresses on the Edge, ESXi host, and servers

When you have an Edge and ESXi running on the same converged platform, you must change IP addresses in a specific order to keep the task simple and fast. You can use this procedure when staging the Edges in the data center or moving them from one site to another. This procedure assumes that the Edges are configured with IP addresses in a staged or production environment. You must test and verify all ESXi, servers, and interfaces before making these changes.

To change the IP addresses on the Edge, ESXi host, and servers

1. Starting with the Windows server, use your vSphere client to connect to the console, login and change it to DHCP or the new destination IP address, and shut down the Windows server from the console.

2. To change the IP address of the ESXi host, the procedure is different depending on whether the Edge is a SteelHead EX or a SteelFusion Edge.

SteelFusion Design Guide 163 Deployment Best Practices Edge best practices

SteelHead EX Use virtual network computing (VNC) client to connect to the ESXi console, change the IP to the new destination IP address, and shut down ESXi from the console. If you did not configure VNC during the ESXi installation wizard, you may also use vSphere Client and change it from Configuration > Networking > rvbd_vswitch_pri > Properties. SteelFusion Edge Connect to the ESXi console serial port or run the following command at the RiOS command line to show the ESXi console on the screen and allow you to change the IP address.

hypervisor console

3. On the Edge Management Console, choose Networking > Networking: In-Path Interfaces, and then change the IP address for inpath0_0 to the new destination IP address.

4. Use the included console cable to connect to the console port on the back of the Edge appliance and log in as the administrator.

5. Enter the following command to change the IP address to your new destination IP address.

enable config terminal interface primary ip address 1.7.7.7 /24 ip default-gateway 1.7.7.1 write memory

6. Enter the following command to shut down the appliance:

reload halt

7. Move the Edge appliance to the new location.

8. Start your Windows server at the new location and open the iSCSI Initiator.

 Select the Discovery tab and remove the old portal.

 Click OK.

 Open the tab again and select Discover Portal.

 Add the new Edge appliance primary IP address. This process brings the original data disk to functioning.

Disk management

Note: Block storage mode only.

You can specify the size of the local LUN during the hypervisor installation on the Edge. The installation wizard allows more flexible disk partitioning in which you can use a percentage of the exact amount in gigabytes that you want to use for the local LUN. The rest of the disk space is allocated to Edge blockstore. To streamline the ESXi configuration, run the hypervisor installer before connecting the Edge appliance to the Core to set up local storage. If local storage is configured during the hypervisor installation, all LUNs provisioned by the Core to the Edge are automatically made available to the ESXi of the SteelFusion Edge.

164 SteelFusion Design Guide Edge best practices Deployment Best Practices

For more information on disk management, see “Configuring disk management on the Edge appliance” on page 77.

Rdisk traffic routing options

You can route Rdisk traffic out of the primary or the in-path interfaces. This section contains the following topics:

 “In-path interface” on page 165

 “Primary interface” on page 165 For more information about Rdisk, see “Network Quality of Service” on page 157. For information about WAN redundancy, see “Configuring WAN redundancy” on page 103.

In-path interface Select the in-path interface when you deploy the SteelFusion Edge W0 appliance. When you configure Edge to use the in-path interface, the Rdisk traffic is intercepted, optimized, and sent directly out of the WAN interface toward the Core deployed at the data center. Use this option during proof of concepts (POC) installations or if the primary interface is dedicated to management. The drawback of this mode is the lack of redundancy in the event of WAN interface failure. In this configuration, only the WAN interface needs to be connected. Disable link state propagation.

Primary interface We recommend that you select the primary interface when you deploy the SteelFusion Edge W1-W3 appliance. When you configure Edge to use the primary interface, the Rdisk traffic is sent unoptimized out of the primary interface to a switch or a router that in turn redirects the traffic back into the LAN interface of the Edge RiOS node to get optimized. The traffic is then sent out of the WAN interface toward the Core deployed at the data center. This configuration offers more redundancy because you can have both in-path interfaces connected to different switches.

Deploying SteelFusion with third-party traffic optimization

The Edges and Cores communicate with each other and transfer data-blocks over the WAN using six different TCP port numbers: 7950, 7951, 7952, 7953, 7954, and 7970.

SteelFusion Design Guide 165 Deployment Best Practices Edge best practices

Figure 12-1 shows a deployment in which the remote branch and data center third-party optimization appliances are configured through WCCP. You can optionally configure WCCP redirect lists on the router to redirect traffic belonging to the six different TCP ports of SteelFusion to the SteelHeads. Configure a fixed-target rule for the six different TCP ports of SteelFusion to the in-path interface of the data center SteelHead.

Figure 12-1. SteelFusion behind a third-party deployment scenario

Windows and ESX server storage layout—SteelFusion-protected LUNs vs. local LUNs

Note: Block storage mode only.

Note: This section describes different LUNs and storage layouts. It includes the following topics:

 “Physical Windows server storage layout” on page 167

 “Virtualized Windows server on ESX infrastructure with production data LUN on ESX datastore storage layout” on page 168

Note: SteelFusion-protected LUNs are also known as iSCSI LUNs. This section refers to iSCSI LUNs as SteelFusion- protected LUNs.

Transient and temporary server data is not required in the case of disaster recovery and therefore does not need to be replicated back to the data center. For this reason, we recommend that you separate transient and temporary data from the production data by implementing a layout that separates the two into multiple LUNs. In general, plan to configure one LUN for the operating system, one LUN for the production data, and one LUN for the temporary swap or paging space. Configuring LUNs in this manner greatly enhances data protection and operations recovery in case of a disaster. This extra configuration also facilitates migration to server virtualization if you are using physical servers. For more information about disaster recovery, see “Data Resilience and Security” on page 137. In order to achieve these goals, SteelFusion implements two types of LUNs: SteelFusion-protected (iSCSI) LUNs and local LUNs. You can add LUNs by choosing Configure > Manage: LUNs.

166 SteelFusion Design Guide Edge best practices Deployment Best Practices

Use SteelFusion-Protected LUNs to store production data. They share the space of the blockstore cache. The data is continuously replicated and kept in sync with the associated LUN back at the data center. The Edge cache only keeps the working set of data blocks for these LUNs. The remaining data is kept at the data center and predictably retrieved at the edge when needed. During WAN outages, edge servers are not guaranteed to operate and function at 100 percent because some of the data that is needed can be at the data center and not locally present in the Edge blockstore cache. One particular type of SteelFusion-protected LUN is the pinned LUN. Pinned LUNs are used to store production data but they use dedicated space in the Edge. The space required and dedicated in the blockstore cache is equal to the size of the LUN provisioned at the data center. The pinned LUN enables the edge servers to continue to operate and function during WAN outages because 100 percent of data is kept in blockstore cache. Like regular SteelFusion LUNs the data is replicated and kept in sync with the associated LUN at the data center. For more information about pinned LUNs, see “When to pin and prepopulate the LUN” on page 174. Use local LUNs to store transient and temporary data. Local LUNs also use dedicated space in the blockstore cache. The data is never replicated back to the data center because it is not required in the case of disaster recovery.

Physical Windows server storage layout When deploying a physical Windows server, separate its storage into three different LUNs: the operating system and swap space (or page file) can reside in two partitions on the server internal hard drive (or two separate drives), while production data should reside on the SteelFusion-protected LUN (Figure 12-2).

Figure 12-2. Physical server layout

This layout facilitates future server virtualization and service recovery in the case of hardware failure at the remote branch. The production data is hosted on a SteelFusion-protected LUN, which is safely stored and backed up at the data center. In case of a disaster, you can stream this data with little notice to a newly deployed Windows server without having to restore the entire dataset from backup.

SteelFusion Design Guide 167 Deployment Best Practices Edge best practices

Virtualized Windows server on ESX infrastructure with production data LUN on ESX datastore storage layout When you deploy a virtual Windows server into an ESX infrastructure, you can also store the production data on an ESX datastore mapped to a SteelFusion-protected LUN (Figure 12-3). This deployment facilitates service recovery in the event of hardware failure at the remote branch because SteelFusion appliances optimize not only LUNs formatted directly with NTFS file system but also optimize LUNs that are first virtualized with VMFS and are later formatted with NTFS.

Figure 12-3. Virtual server layout 2

VMFS datastores deployment on SteelFusion LUNs

When you deploy VMFS datastores on SteelFusion-protected LUNs, for best performance, choose the Thick Provision Lazy Zeroed disk format (VMware default). Because of the way we use blockstore in the Edge, this disk format is the most efficient option. Thin provisioning is when you assign a LUN to be used by a device (in this case a VMFS datastore for an ESXi server, host) and you tell the host how big the LUN is (for example, 10 GiB). However, as an administrator you can choose to pretend that the LUN is 10 GiB, and only assign the host 2 GiB. This fake number is useful if you know that the host needs only 2 GiB to begin with. As time goes by (days or months), and the host starts to write more data and needs more space, the storage array automatically grows the LUN until eventually it really is 10 GiB in size. Thick provisioning means there is no pretending. You allocate all 10 GiB from the beginning whether the host needs it from day one or not.

168 SteelFusion Design Guide Edge best practices Deployment Best Practices

Whether you choose thick or thin provisioning, you need to initialize (format) the LUN in the same way as any other new disk. The formatting is essentially a process of writing a pattern to the disk sectors (in this case zeros). You cannot write to a disk before you format it. Normally, you have to wait for the entire disk to be formatted before you can use it—for large disks, this process can take hours. Lazy Zeroed means the process works away slowly in the background and as soon as the first few sectors have been formatted the host can start using it. This immediate usage means the host does not have to wait until the entire disk (LUN) is formatted. VMware ESXi 5.5 and later support the vStorage APIs for Array Integration (VAAI) feature. This feature uses the SCSI WRITE SAME command when creating or using a vmdk. When using thin-provisioned vmdk files, ESXi creates new extents in the vmdk, by first writing binary 0s and then the block device (filesystem) data. When using thick provisioned vmdk files, ESXi creates all extents by writing binary 0s. Versions prior to SteelFusion 4.2 of Core and Edge software only supported 10-byte and 16-byte versions of the command. With SteelFusion 4.2 and later, both the Core and Edge software support the use of SCSI WRITE SAME (32 byte) command. This support enables much faster provisioning and formatting of LUNs used for VMFS datastores.

Enable Windows persistent bindings for mounted iSCSI LUNs

Make iSCSI LUNs persistent across Windows server reboots; otherwise, you must manually reconnect them. To configure Windows servers to automatically connect to the iSCSI LUNs after system reboots, select the Add this connection to the list of Favorite Targets check box (Figure 12-4) when you connect to the Edge iSCSI target.

Figure 12-4. Favorite targets

To make iSCSI LUNs persistent and ensure that Windows does not consider the iSCSI service fully started until connections are restored to all the SteelFusion volumes on the binding list, remember to add the Edge iSCSI target to the binding list of the iSCSI service. This addition is important particularly if you have data on an iSCSI LUN that other services depend on: for example, a Windows file server that is using the iSCSI LUN as a share.

SteelFusion Design Guide 169 Deployment Best Practices Edge best practices

The best option to do this process is to select the Volumes and Devices tab from the iSCSI Initiator's control panel and click Auto Configure (Figure 12-5). This action binds all available iSCSI targets to the iSCSI startup process. If you want to choose individual targets to bind, click Add. To add individual targets, you must know the target drive letter or mount point.

Figure 12-5. Target binding

Set up memory reservation for VMs running on VMware ESXi in the VSP

By default, VMware ESXi dynamically tries to reclaim unused memory from guest virtual machines, while the Windows operating system uses free memory to perform caching and avoid swapping to disk. To significantly improve performance of Windows virtual machines, configure memory reservation to the highest possible value of the ESXi memory available to the VM. This configuration applies whether the VMs are hosted within the hypervisor node of the Edge or on an external ESXi server in the branch that is using LUNs from SteelFusion. Setting the memory reservation to the configured size of the virtual machine results in a per virtual machine vmkernel swap file of zero bytes, which consumes less storage and helps to increase performance by eliminating ESXi host-level swapping. The guest operating system within the virtual machine maintains its own separate swap and page file.

170 SteelFusion Design Guide Edge best practices Deployment Best Practices

Boot from an unpinned iSCSI LUN

Note: Block storage mode only.

If you are booting a Windows server or client from an unpinned iSCSI LUN, we recommend that you install the Riverbed Turbo Boot software on the Windows machine. The Riverbed Turbo Boot software greatly improves the boot process over the WAN performance because it allows Core to send to Edge only the files needed for the boot process.

Note: The SteelFusion Turbo Boot plugin is not compatible with the branch recovery agent. For more information about the branch recovery agent, see “How branch recovery works” on page 133 and the SteelFusion Core Management Console User’s Guide.

Running antivirus software

There are two antivirus scanning modes:

 On-demand - Scans the entire LUN data files for viruses at scheduled intervals.

 On-access - Scans the data files dynamically as they are read or written to disk. There are two common locations to perform the scanning:

 On-host - Antivirus software is installed on the application server.

 Off-host - Antivirus software is installed on dedicated servers that can access directly the application server data. In typical SteelFusion deployments in which the LUNs at the data center contain the full amount of data and the remote branch cache contains the working set, run on-demand scan mode at the data center and on-access scan mode at the remote branch. Running on-demand full file system scan mode at the remote branch causes the blockstore to wrap and evict the working set of data leading to slow performance results. However, if the LUNs are pinned, on-demand full file system scan mode can also be performed at the remote branch. Whether scanning on-host or off-host, the SteelFusion solution does not dictate one way versus another, but in order to minimize the server load, we recommend off-host virus scans.

Running disk defragmentation software

Disk defragmentation software is another category of software that can possibly cause the SteelFusion blockstore cache to wrap and evict the working set of data. Do not run disk defragmentation software. Disable default-enabled disk defragmentation on Windows 7 or later.

Running backup software

Backup software is another category of software that will possibly cause the Edge blockstore cache to wrap and evict the working set of data, especially during the execution of full backups. In a SteelFusion deployment, run differential, incremental, synthetic full, and a full backup at the data center.

SteelFusion Design Guide 171 Deployment Best Practices Core best practices

Configure jumbo frames

If jumbo frames are supported by your network infrastructure, use jumbo frames between Core and storage arrays. We make the same recommendation for the Edge and any external application servers (not hosted within VSP) that are using LUNs from the Edge. The application server interfaces must support jumbo frames. For details, see “Configuring Edge for jumbo frames” on page 76.

Removing Core from Edge and re-adding Core

When a Core is removed from the Edge with the “preserve config” setting enabled, the Edge local storage (LUNs, if using block mode, or exports if using NFS/file mode), are saved and the offline remote storage (LUNs or exports) are removed from the Edge configuration. On the Core, there is no change, but the storage (LUNs or exports) show as “Not connected.” In most scenarios, the reason for this procedure is replacement of the Core. If a new Core is added to the Edge, the Edge local storage is merged from the Edge to the Core and normal operations are resumed. However, if for some reason the same Core is added back, the Edge local storage information on the Core must be cleared from the Core by removing any entries for the specific Edge local LUNs, or exports (if using NFS/file mode), before the add operation is performed.

Note: As long as the Edge and Core are truly disconnected when the Edge local storage entries are removed, no Edge local storage is physically deleted. Only the entries themselves are cleared.

Once the same Core is added back again, the Edge local storage information, on the Edge, is merged to the Core configuration. At the same time, the remote offline storage information on the Core that is mapped to the Edge is merged across to the Edge. Failure to clear the information prior to re-adding the Core will result in the Core rejecting the Edge connection and both the Core and Edge logfiles reporting a “Config mismatch.” Further details are available on the Riverbed Support site, in Knowledge Base article S30272 - https:// supportkb.riverbed.com/support/index?page=content&id=S30272.

Core best practices This section describes best practices for deploying the Core. It includes the following topics:

 “Deploy on gigabit Ethernet networks” on page 173

 “Use CHAP” on page 173

 “Configure initiators and storage groups or LUN masking” on page 173

 “Core hostname and IP address” on page 173

 “Segregate storage traffic from management traffic” on page 174

 “When to pin and prepopulate the LUN” on page 174

 “Core configuration export” on page 175

 “Core in HA configuration replacement” on page 175

 “LUN-based data protection limits” on page 175

172 SteelFusion Design Guide Core best practices Deployment Best Practices

 “WAN usage consumption for a Core to Edge VMDK data migration” on page 175

 “Reserve memory and CPU resources when deploying Core-v” on page 176

Deploy on gigabit Ethernet networks

The iSCSI protocol enables block-level traffic over IP networks. However, iSCSI is both latency and bandwidth sensitive. To optimize performance reliability, deploy Core and the storage array on Gigabit Ethernet networks.

Use CHAP

Note: Block storage mode only.

For additional security, use CHAP between Core and the storage array, and between Edge and the server. One-way CHAP is also supported. For more information, see “Using CHAP to secure iSCSI connectivity” on page 144.

Configure initiators and storage groups or LUN masking

Note: Block storage mode only.

To avoid unwanted hosts to access LUNs mapped to Core, configure initiator and storage groups between Core and the storage system. This particular practice is also known as LUN masking or Storage Access Control. When mapping Fibre Channel LUNs to the Core-vs, ensure the ESXi servers in the cluster that are hosting the Core-v appliances have access to these LUNs. Configure the ESXi servers in the cluster that are not hosting the Core-v appliances to not have access to these LUNs.

Core hostname and IP address

If the branch DNS server runs on VSP and its DNS datastore is deployed on a LUN used with SteelFusion, use the Core IP address instead of the hostname when you specify the Core hostname and IP address. If you must use the hostname, deploy the DNS server on the VSP internal storage, or configure host DNS entries for the Core hostname on the SteelHead.

SteelFusion Design Guide 173 Deployment Best Practices Core best practices

Segregate storage traffic from management traffic

To increase overall security, minimize congestion, minimize latency, and simplify the overall configuration of your storage infrastructure, segregate storage traffic from regular LAN traffic using VLANs (Figure 12-6).

Figure 12-6. Traffic segregation

When to pin and prepopulate the LUN

Note: Block storage mode only.

SteelFusion technology has built-in file system awareness for NTFS and VMFS file systems. There are two likely circumstances when you need to pin and prepopulate the LUN:

 “LUNs containing file systems other than NTFS and VMFS and LUNs containing unstructured data” on page 174

 “Data availability at the branch during a WAN link outage” on page 174

LUNs containing file systems other than NTFS and VMFS and LUNs containing unstructured data Pin and prepopulate the LUN for unoptimized file systems such as FAT, FAT32, ext3, and so on. You can also pin the LUN for applications like that use raw disk file format or proprietary file systems.

Data availability at the branch during a WAN link outage When the WAN link between the remote branch office and the data center is down, data no longer travels through the WAN link. Hence, SteelFusion technology and its intelligent prefetch mechanisms no longer functions. Pin and prepopulate the LUN if frequent prolonged periods of WAN outages are expected. By default, the Edge keeps a write reserve that is 10 percent of the blockstore size. If prolonged periods of WAN outages are expected, appropriately increase the write reserve space.

174 SteelFusion Design Guide Core best practices Deployment Best Practices

Core configuration export

Store and back up the configuration on an external server in case of system failure. Enter the following CLI commands to export the configuration:

enable configure terminal configuration bulk export scp://username:password@server/path/to/config Complete this export each time a configuration operation is performed or you have some other changes on your configuration.

Core in HA configuration replacement

If the configuration has been saved on an external server, the failed Core can be seamlessly replaced. Enter the following CLI commands to retrieve what was previously saved:

enable configure terminal no service enable configuration bulk import scp://username:password@server/path/to/config service enable

LUN-based data protection limits

When using LUN-based data protection, be aware that each snapshot/backup operation takes approximately 2 minutes to complete. This means that if the hourly option is configured for more than 30 LUNs, it is quite possible that there could be an increasing number of nonreplicated snapshots on Edges.

WAN usage consumption for a Core to Edge VMDK data migration

When provisioning VMs as part of a data migration, it is possible to see high traffic usage across the WAN link. This can be due to the type of VMDK that is being migrated. This table gives an example of WAN usage consumption for a Core to Edge VMDK data migration containing a 100 GiB VMDK with 20 GiB used.

VMDK type WAN traffic usage Space used on array Space used on array VMDK fragmentation thick LUNs thin LUNs

Thin 20 GB 20 GiB 20 GiB High Thick eager zero 100 GB + 20 GB = 100 GiB 100 GiB None (flat) 120 GB

Thick lazy zero 20 GB + 20 GB = 100 GiB 100 GiB None (flat) (default) 40 GB

For more details, see this Knowledge Base article on the Riverbed Support site: https://supportkb.riverbed.com/support/index?page=content&id=S23357

SteelFusion Design Guide 175 Deployment Best Practices iSCSI initiators timeouts

Reserve memory and CPU resources when deploying Core-v

When deploying Core-v, see the SteelFusion Core Installation and Configuration Guide to understand what hardware requirements (memory, CPU, and disk) are required to support your Core-v model. We strongly recommend that you allocate and reserve the correct amount of resources as specified in the SteelFusion Core Installation and Configuration Guide. Reserving these resources ensures that the memory and recommended CPU are dedicated for use by the Core-v instance, enabling it to perform as expected.

iSCSI initiators timeouts Note: Block storage mode only.

This section contains the following topics:

 “Microsoft iSCSI initiator timeouts” on page 176

 “ESX iSCSI initiator timeouts” on page 176

Microsoft iSCSI initiator timeouts

By default, the Microsoft iSCSI Initiator LinkDownTime timeout value is set to 15 seconds and the MaxRequestHoldTime timeout value is also 15 seconds. These timeout values determine how much time the initiator holds a request before reporting an iSCSI connection error. You can increase these values to accommodate longer outages, such as an Edge failover event or a power cycle in the case of a single appliance. If MPIO is installed in the Microsoft iSCSI Initiator, the LinkDownTime value is used. If MPIO is not installed, MaxRequestHoldTime is used instead. If you are using Edge in an HA configuration and MPIO is configured in the Microsoft iSCSI Initiator, change the LinkDownTime timeout value to 60 seconds to allow the failover to complete.

ESX iSCSI initiator timeouts

By default, the VMware ESX iSCSI Initiator DefaultTimeToWait timeout is set to 2 seconds. This amount of time is the minimum to wait before attempting an explicit or implicit logout or active iSCSI task reassignment after an unexpected connection termination or a connection reset. You can increase this value to accommodate longer outages, such as an Edge failover event or a power cycle in case of a single appliance. If you are using Edge in an HA configuration, change the DefaultTimeToWait timeout value to 60 seconds to allow the failover to complete. For more information about iSCSI Initiator timeouts, see “Configuring iSCSI initiator timeouts” on page 77.

Operating system patching Note: Block storage mode only.

176 SteelFusion Design Guide Related information Deployment Best Practices

This section contains the following topics:

 “Patching at the branch office for virtual servers installed on iSCSI LUNs” on page 177

 “Patching at the data center for virtual servers installed on iSCSI LUNs” on page 177

Patching at the branch office for virtual servers installed on iSCSI LUNs

You can continue to use the same or existing methodologies and tools to perform patch management on physical or virtual branch servers booted over the WAN using SteelFusion appliances.

Patching at the data center for virtual servers installed on iSCSI LUNs

If you want to perform virtual server patching at the data center and save a round-trip of patch software from the data center to the branch office, use the following procedure.

To perform virtual server patching at the data center

1. At the branch office:

 Power down the virtual machine.

 Take the VMFS datastore offline.

2. At the data center:

 Take the LUN on the Core offline.

 Mount the LUN to a temporary ESX server.

 Power up the virtual machine, and apply patches and file system updates.

 Power down the virtual machine.

 Take the VMFS datastore offline.

 Bring the LUN on the Core online.

3. At the branch office:

 Bring the VMFS datastore online.

 Boot up the virtual machine. If the LUN was previously pinned at the edge, patching at the data center can potentially invalidate the cache. If this is the case, you might need to prepopulate the LUN.

Related information

 SteelFusion Core Management Console User’s Guide

 SteelFusion Edge Management Console User’s Guide

 SteelFusion Core Installation and Configuration Guide

 SteelFusion Command-Line Interface Reference Manual

SteelFusion Design Guide 177 Deployment Best Practices Related information

 Fibre Channel on SteelFusion Core Virtual Edition Solution Guide

 Riverbed Splash at https://splash.riverbed.com/community/product-lines/steelfusion

178 SteelFusion Design Guide 13

SteelFusion and NFS

SteelFusion version 5.0 includes support for Network File System (NFS) storage. It is important to note that this capability is designed for deployment separately from a SteelFusion block storage (iSCSI, Fibre Channel) installation. The two deployment modes (block storage and NFS) cannot be combined within the same SteelFusion appliance (SteelFusion Core or SteelFusion Edge). However, many of the features included with a SteelFusion block storage implementation are also available with a SteelFusion NFS/file deployment. Review the other chapters in this guide for more detailed advice and guidance about these other features. This chapter explains the differences between an NFS/file deployment and a block storage deployment. This chapter includes these sections:

 “Introduction to SteelFusion with NFS” on page 179

 “Existing SteelFusion features available with NFS/file deployments” on page 181

 “Unsupported SteelFusion features with NFS/file deployments” on page 181

 “Basic design scenarios” on page 182

 “Basic configuration deployment principles” on page 182

 “Overview of high availability with NFS” on page 190

 “Snapshots and backup” on page 194

 “Best practices” on page 196

Introduction to SteelFusion with NFS The Network File System (NFS) protocol is an open standard originally developed in 1984. NFSv2 was released in 1989 and NFSv3 was added in 1995. Although NFSv4 was released in 2000, it has had a number of revisions, the most recent being in 2015. Further details about the individual versions and their capabilities and enhancements go beyond the scope of this guide; however, information is available externally if required. NFSv3 continues to be the most frequently deployed version, with TCP being the preferred delivery protocol in favor of its alternative, UDP.

SteelFusion Design Guide 179 SteelFusion and NFS Introduction to SteelFusion with NFS

Prior to version 5.0, SteelFusion specialized in providing block-level access to LUNs served via iSCSI from SteelFusion Edge. In simple terms, this is a Storage Area Network (SAN) for the branch office, which is actually an extension of the SAN located in a data center across the far end of a WAN link. Among the many scenarios for this capability is support for VMFS LUNs mapped to VMware vSphere (ESXi host) servers that use them as their datastores. This includes the hypervisor resident inside the Edge itself. Additionally, vSphere supports the ability to mount fileshares served out from Network Attached Storage (NAS). These fileshares are exported from a file server across the network to the ESXi host using the Network File System (NFS) protocol. When the fileshare is mounted by the ESXi host, it too can be used as a datastore. SteelFusion version 5.0 supports this alternative method to access storage used for datastores by ESXi hosts. The SteelFusion implementation uses NFSv3 over TCP. In the branch office, SteelFusion Edge supports the export of NFS fileshares to external ESXi servers where they can be used as datastores. SteelFusion Edge also supports the export of fileshares to the ESXi server that is hosted inside the hypervisor node of the SteelFusion Edge. This figure shows a basic deployment diagram, including an external ESXi server in the branch, indicating the protocols in use.

Figure 13-1. Use of NFS protocol

The general internal architecture and operation of SteelFusion is preserved despite the different mode of operating with storage. Many of the existing features and benefits of a SteelFusion deployment that exist with a block storage configuration are available when using NFS. To accommodate the NFS capability within SteelFusion, the use of some network interfaces in Core and Edge may differ from that used in a block storage deployment. Therefore, care must be taken to ensure the correct assignments and configuration settings are made. For details, see “SteelFusion Core interface and port configuration” on page 183 and “SteelFusion Edge interface and port configuration” on page 186. Although the majority of existing SteelFusion features available in a block storage deployment are also available with an NFS/file deployment, there are a few items that are not currently supported. For details, see “Existing SteelFusion features available with NFS/file deployments” on page 181 and “Unsupported SteelFusion features with NFS/file deployments” on page 181.

180 SteelFusion Design Guide Existing SteelFusion features available with NFS/file deployments SteelFusion and NFS

Existing SteelFusion features available with NFS/file deployments With SteelFusion version 5.0 in an NFS/file deployment, the following features from previous releases continue to be supported:

 Edge high availability

 Core high availability

 Prefetch

 Prepopulation

 Boot over WAN

 Pinning storage in the branch

 Local storage in the branch

 Snapshot

 Server-level backups In addition to the above features, SteelFusion version 5.0 with NFS has been qualified with the following storage systems for both the data path and snapshot integration:

 NetApp C mode

 Isilon This backup software is qualified:

 Commvault

 NetBackup

 BackupExec

 Veeam

 Avamar

Note: Although not qualified, other vendors' storage systems and backup software products may work. See the Riverbed Support website and your local Riverbed representative for the most up to date information.

The following models of SteelFusion Core support NFS/file deployments:

 VGC-1500

 SFCR-3500

Unsupported SteelFusion features with NFS/file deployments With SteelFusion version 5.0 in an NFS/file deployment, the following features from previous versions are not currently supported or available:

 FusionSync between SteelFusion Cores

 Coreless SteelFusion Edge

 Virtual SteelFusion Edge

SteelFusion Design Guide 181 SteelFusion and NFS Basic design scenarios

SteelHead EX version 5.0 does not support NFS/file deployments. The following models of SteelFusion Core do not support NFS/file deployments:

 vGC-1000

 SFCR2000

 SFCR3000

Note: An individual Edge or Core can only be deployed in one of the two modes; block storage or NFS/file. It is not possible for the same Edge or Core to support both storage modes simultaneously.

Note: The SteelFusion NFS/file implementation is not designed as a generic file server with NFS access and cannot be used as a global fileshare.

Basic design scenarios When deploying SteelFusion with NFS, all of the basic design principles that would be used for a SteelFusion deployment in a block storage (iSCSI or Fibre Channel) scenario can generally be applied. Some of the key principles are listed here:

 A single SteelFusion Core can service one or more SteelFusion Edge appliances, but a SteelFusion Edge appliance is only ever assigned to a single SteelFusion Core, and its peer in the case of a SteelFusion Core high-availability (HA) design.

 In a block storage deployment, each LUN is dedicated to a server in a branch and can only be projected to a specific Edge appliance, the same applies with NFS exports. Each individual NFS export is specific to a branch ESXi server (external or internal to the SteelFusion Edge) and only projected across the WAN link to a specific Edge.

 With a block storage deployment, the LUNs in the backend storage array that are assigned for SteelFusion use cannot be simultaneously accessed by other appliances for reading and writing. Backups are performed in the data center by first taking a snapshot of the LUN and then using a proxy host to backup the snapshot. The same rules and guidance apply for NFS exports. They should be protected on the NFS file server from read/write access by other devices. If necessary, other NFS clients could mount an export with a read-only option. Backups should be performed on a snapshot of the exported fileshare.

 There are some differences to be aware of when planning and implementing a design that incorporates NFS. These differences are primarily related to the allocation and use of Ethernet network interfaces on the Edge. There is also the requirement for an Edge Virtual IP (VIP) to be configured on the Edge. The reason for this is to provide the option of HA at the Edge. These differences compared to a block storage deployment are discussed in more detail in “Basic configuration deployment principles” on page 182.

Basic configuration deployment principles Configuration and deployment of SteelFusion with NFS can be separated into two categories, the SteelFusion Core and the SteelFusion Edge. This section includes the following topics:

 “SteelFusion Core with NFS/file deployment process overview” on page 183

182 SteelFusion Design Guide Basic configuration deployment principles SteelFusion and NFS

 “SteelFusion Core interface and port configuration” on page 183

 “SteelFusion Edge with NFS - appliance architecture” on page 185

 “Virtual Services Platform hypervisor installation with NFS/file mode” on page 185

 “SteelFusion Edge interface and port configuration” on page 186

SteelFusion Core with NFS/file deployment process overview

Deployment of the SteelFusion Core with NFS can be almost directly compared with the procedures and settings that are required for a Core in a block storage scenario. We recommend that you review the details of this section and also the contents of Chapter 3, “Deploying the Core.” At a high level, deploying the SteelFusion Core requires completion of the following tasks in order: 1. Install and connect the Core in the data center network. Optionally, you can include both Cores if you are deploying a high-availability solution. For more information on installation, see the SteelFusion Core Installation and Configuration Guide.

2. In the SteelFusion Core Management Console, choose configure > Manage: SteelFusion Edges. Define the SteelFusion Edge Identifiers so you can later establish connections between the SteelFusion Core and the corresponding SteelFusion Edges.

3. In the SteelFusion Edge Management Console, choose Storage > Storage Edge Configuration. In this page, specify:

 the hostname/IP address of the SteelFusion Core

 the SteelFusion Edge Identifier (as defined in step 2)

 the SteelFusion Edge VIP address (for details, see “Edge Virtual IP address” on page 187)

4. Using the Mount and Map Exports wizard in the SteelFusion Core Management Console, add a storage array and discover the NFS exports from the backend file server. If Edge appliances have already been connected to the Core, then the exports can be mapped to their respective Edges. At the same time, you can configure the export access permissions at the Edge, as well as pinning and prepopulation settings. For more details, see the SteelFusion Core Management Console User’s Guide (NFS). For examples of how to configure SteelFusion for NFS with Dell EMC Isilon or NetApp Cluster Mode, see the relevant solution guide on the Riverbed Splash site at https://splash.riverbed.com/community/ product-lines/steelfusion.

SteelFusion Core interface and port configuration

This section describes a typical network port configuration. Depending on your deployment scenario, you might require additional routing configuration. This section includes the following topics:

 “SteelFusion Core ports with NFS/file deployment” on page 184

 “Configuring SteelFusion Core interface routing with NFS/file deployment” on page 184

SteelFusion Design Guide 183 SteelFusion and NFS Basic configuration deployment principles

SteelFusion Core ports with NFS/file deployment The following table summarizes the ports that connect the SteelFusion Core appliance to your network. Unless noted, the port and descriptions are for all SteelFusion Core models that support NFS.

Port Description

Console Connects the serial cable to a terminal device. You establish a serial connection to a terminal emulation program for console access to the Setup Wizard and the Core CLI. Primary (PRI) Connects Core to a VLAN switch through which you can connect to the Management Console and the Core CLI. You typically use this port for communication with Edge appliances. Auxiliary (AUX) Connects the Core to the management VLAN. You can connect a computer directly to the appliance with a crossover cable, enabling you to access the CLI or Management Console. eth1_0 to eth1_3 Connects the eth1_0, eth1_1, eth1_2, and eth1_3 ports of Core to a LAN switch using a (applies to SFCR- straight-through cable. You can use the ports either for NFS connectivity or failover 3500) interfaces when you configure Core for high availability (HA) with another Core. In an HA deployment, failover interfaces are usually connected directly between Core peers using crossover cables. If you deploy the Core between two switches, all ports must be connected with straight-through cables.

This figure shows a basic HA deployment indicating some of the SteelFusion Core ports and use of straight-through or crossover cables.

Figure 13-2. Ports for Core model 3500

For more information about HA deployments, see “Overview of high availability with NFS” on page 190 and Chapter 6, “SteelFusion Appliance High-Availability Deployment.”

Configuring SteelFusion Core interface routing with NFS/file deployment Interface routing is configured in the same way as would be for a block storage deployment. For details, see “Configuring interface routing” on page 38.

184 SteelFusion Design Guide Basic configuration deployment principles SteelFusion and NFS

SteelFusion Edge with NFS - appliance architecture

A SteelFusion Edge appliance that is configured for an NFS/file deployment has the same basic internal architecture as an Edge that is configured for block storage deployment. This enables the majority of the existing SteelFusion features and services such as HA, prepopulation, local storage, pinned storage, snapshots, and so on to continue being supported. This figure shows a similar internal diagram to that shown in Figure 5-1, but this figure shows support for an NFS/file deployment.

Figure 13-3. Edge architecture in NFS/file mode

Although not shown in the architecture diagram, with SteelFusion version 5.0 and later, there is additional functionality within the RiOS node to support an NFS file server. At a high level, the NFS file server in the RiOS node interacts with the existing BlockStream and VSP components to provide branch ESXi servers, including the Edge hypervisor node, with exported fileshares. These fileshares have been projected across the WAN by SteelFusion Core from centralized NFS storage. With SteelFusion Edge deployed in NFS/file mode, the internal interconnect between the RiOS node and hypervisor node is only used for management traffic related to health and status of the architecture. The storage interconnect is not available for use by the NFS file server; this is by design. To enable the hypervisor node to mount exported fileshares from the NFS file server in RiOS, you must create external network connections. In Figure 13-3 the external network connection is shown as being between the RiOS node auxiliary (AUX) network interface port and one of the hypervisor node data network ports. The AUX network interface port will be configured with Virtual IP address (VIP), which will be used to mount the exports on the hypervisor. Depending on your requirements, it may be possible to use alternative network interfaces for this external connectivity. For more details, see “SteelFusion Core ports with NFS/file deployment” on page 184.

Virtual Services Platform hypervisor installation with NFS/file mode

Before performing the initial installation of the Virtual Services Platform (VSP) hypervisor in the SteelFusion Edge hypervisor node, the SteelFusion Core must already be configured and connected. See “SteelFusion Core with NFS/file deployment process overview” on page 183 for an outline of the required tasks, and see the SteelFusion Core Management Console User’s Guide (NFS mode) for more detailed information.

SteelFusion Design Guide 185 SteelFusion and NFS Basic configuration deployment principles

Once Core is correctly configured, you can install the VSP hypervisor. The SteelFusion Edge Management Console is equipped with an installer wizard to guide you through the specific steps of the installation process. For a successful installation, be aware that exported fileshares projected from the Core and mapped to the Edge will only be detected by the hypervisor if they are configured with “VSP service” or “Everyone” access permissions. Once the VSP is installed, additional mapped fileshares can be added by mounting manually (via ESXi server) configuration tools such as vSphere client, and specifying the Edge VIP address. See the SteelFusion Edge Management Console User’s Guide for more details on the VSP hypervisor installation process.

SteelFusion Edge interface and port configuration

This section describes a typical network port configuration. Depending on your deployment scenario, you might require additional network configuration. This section includes the following topics: “SteelFusion Edge ports with NFS/file deployment” on page 186 “Edge Virtual IP address” on page 187

SteelFusion Edge ports with NFS/file deployment This table summarizes the ports that connect the SteelFusion Edge appliance to your network.

Port Description

Primary (PRI) Connects Edge to a VLAN switch through which you can connect to the Management Console and the Edge CLI. This interface is also used to connect to the Core through the Edge RiOS node in-path interface. Auxiliary (AUX) Connects the Edge to the management VLAN. The IP address for the auxiliary interface must be on a subnet different from the primary interface subnet. You can connect a computer directly to the appliance with a crossover cable, enabling you to access the CLI or Management Console. lan1_0 The Edge RiOS node uses one or more in-path interfaces to provide Ethernet network connectivity for optimized traffic. Each in-path interface comprises two physical ports: the LAN port and the WAN port. Use the LAN port to connect the Edge RiOS node to the internal network of the branch office. You can also use this port for a connection to the Primary port. A connection to the Primary port enables the blockstore traffic sent between Edge and Core to transmit across the WAN link. wan1_0 The WAN port is the second of two ports that comprise the Edge RiOS node in-path interface. The WAN port is used to connect the Edge RiOS node toward WAN-facing devices such as a router, firewall, or other equipment located at the WAN boundary. If you need additional in-path interfaces, or different connectivity for in-path (for example, 10 GigE or Fiber), then you can install a bypass NIC in an Edge RiOS node expansion slot.

186 SteelFusion Design Guide Basic configuration deployment principles SteelFusion and NFS

Port Description

eth0_0 to eth0_1 These ports are available as standard on the Edge appliance. When configured for use by Edge RiOS node, the ports can provide additional NFS interfaces for storage traffic to external servers. These ports also enable the ability to provide redundancy for Edge high availability (HA). In such an HA design, we recommend that you use the ports for the heartbeat and BlockStream synchronization between the Edge HA peers. If additional NFS connectivity is required in an HA design, then install a nonbypass data NIC in the Edge RiOS node expansion slot. gbe0_0 to gbe0_3 These ports are available as standard on the Edge appliance. When configured for use by Edge hypervisor node, the ports provide LAN connectivity to external clients and also for management. The ports are connected to a LAN switch using a straight-through cable. If additional connectivity is required for the hypervisor node, then install a nonbypass data NIC in a hypervisor node expansion slot. There are no expansion slots available for the hypervisor node on the SFED 2100 and 2200 models. There are two expansion slots on the SFED 3100, 3200, and 5100 models. Note: Your hypervisor must have access to the Edge VIP via one of these interfaces. This figure shows a basic branch NFS/file deployment indicating some of the Edge network ports. In this scenario, the ESXi server is installed on the branch LAN, external to the Edge.

Figure 13-4. Edge ports used in branch NFS/file deployment - external ESXi server

Note that in this figure, the Edge auxiliary interface (Aux) is labeled NFS VIP. For details on Edge VIP, see “Edge Virtual IP address” on page 187.

Edge Virtual IP address Unlike block storage, NFSv3 doesn’t support the concept of multipath. In order for high availability (HA) to avoid a single point of failure, the general industry standard solution for NFS file servers is to use a virtual IP (VIP) address. The VIP floats between the network interfaces on file servers that are part of an HA deployment. Clients that are accessing file servers using NFS in an HA deployment are configured to use the VIP address. When a file server that is part of an HA configuration is active, only it responds to the client requests. If the active file server fails for some reason, the standby file server starts responding to the client requests via the same VIP address.

SteelFusion Design Guide 187 SteelFusion and NFS Basic configuration deployment principles

On a related note, if there are NFS file servers in the data center configured for HA, then it is their VIP address that is added to the Core configuration. At the branch location, because Edge is the NFS file server, it is configured with a VIP address for the ESXi server(s) to access as NFS clients.

Note: This VIP address must be configured even if Edge is not expected to be part of an Edge HA configuration.

The Edge VIP address requires an underlying network interface with a configured IP address on the same IP subnet. The underlying network interface must be reachable by the NFS clients requiring access to the fileshares exported by the SteelFusion Edge. The NFS clients in this case will be one or more ESXi servers that will mount fileshares exported from the Edge and use them as their datastore. The ESXi server that is in the Edge hypervisor node must also use an external network interface to connect to the configured VIP address. Unlike Edge deployments in block storage mode, it is not supported to use the internal interconnect between the RiOS node and the hypervisor node. Depending on the Edge appliance model, you may have several options available for network interfaces that could be used for NFS access with the VIP address. By default, these would include any of the following: Primary, Auxiliary, eth0_0 or eth0_1. It is important to remember that any of these four interfaces could already be required to use as connectivity for management of the RiOS node, rdisk traffic to/from the Edge in-path interface, or heartbeat and synchronization traffic as part of a SteelFusion Edge HA deployment. Therefore, select a suitable interface based on NFS connectivity requirements and traffic workload. If an additional nonbypass NIC is installed in the RiOS node expansion slot, then you may have additional ethX_Y interfaces available for use. The Edge VIP address is configured in the Edge Management Console in the Storage Edge Configuration page. This figure shows a sample configuration setting with the available network interface options.

Figure 13-5. Edge VIP address configuration

188 SteelFusion Design Guide Basic configuration deployment principles SteelFusion and NFS

Figure 13-6 shows a basic branch NFS/file deployment indicating some of the Edge network ports in use. In this scenario the ESXi server is internal to the Edge, in the hypervisor node. It is important to notice that the NFS traffic for the ESXi server in the hypervisor node is not using the internal connections between the RiOS node and the hypervisor node as it is not supported. Instead, the NFS traffic in this example is being transported externally between the auxiliary port of the RiOS node and the gbe0_0 port of the hypervisor node. If there were no other connectivity requirements, these two ports could be connected via a standard Ethernet cross-over cable. However, it is considered best practice to use straight-through cables and connect via a suitable switch.

Figure 13-6. Edge ports used in branch NFS/file deployment - ESXi server in Edge hypervisor node

SteelFusion Design Guide 189 SteelFusion and NFS Overview of high availability with NFS

Figure 13-4 and Figure 13-6 show a basic deployment example where the VIP interface is configured on the Edge auxiliary port. Depending on your requirements, it may be possible to use other interfaces on the Edge. For example, Figure 13-7 and Figure 13-8 show examples where the Edge Primary interface is configured with the NFS VIP address.

Figure 13-7. Edge ports used in branch NFS/file deployment - external ESXi server

Figure 13-8. Edge ports used in branch NFS/file deployment - ESXi server in Edge hypervisor node

Overview of high availability with NFS SteelFusion version 5.0 as part of an NFS/file deployment supports high availability (HA) for both Core and Edge configurations. There are some important differences for Core and Edge HA in an NFS/file deployment compared to a block storage deployment, and they are discussed in this section. However, we strongly recommend that you familiarize yourself with Chapter 6, “SteelFusion Appliance High- Availability Deployment” , which also contains information that can be applied generally for HA designs. This section includes the following topics:

 “SteelFusion Core high availability with NFS” on page 191

 “SteelFusion Edge high availability with NFS” on page 192

190 SteelFusion Design Guide Overview of high availability with NFS SteelFusion and NFS

SteelFusion Core high availability with NFS

This figure shows an example Core HA deployment for NFS.

Figure 13-9. Core ports used in HA - NFS/file deployment

In the example, the use of network interface ports is identical to what could be used for a Core HA deployment for block storage mode. Core A and Core B are interconnected with two cross-over cables that are providing connectivity for heartbeat. There are two NFS file servers, A and B. Although they could each be configured with virtual IP addresses to provide redundancy, it is not shown in the diagram for reasons of simplicity. File server A is exporting two fileshares, A1 and A2, which are mounted by Core A. File server B is exporting one fileshare, B1, which is mounted by Core B. Again, for reasons of simplicity, no Edge is shown in the diagram, but for the purposes of this example we can assume that an Edge has been configured to connect to both Core A and Core B simultaneously in order to map all three projected fileshares: A1, A2 and B1. Under normal conditions, where Core A and Core B are healthy, they are designed to operate in an “active-active” methodology. They are each in control of their respective fileshares, but also aware of their peer’s fileshares. They are independently servicing read and write operations to and from the Edge for their respective fileshares. As described in “Failover states and sequences” on page 91, the Cores check each other via their heartbeat interfaces to ensure their peer is healthy. In a healthy state, both peers are reported as being ActiveSelf. Failover is triggered in the same way as for a block storage deployment, following the loss of nine consecutive heartbeats that normally occur at one-second intervals. When a failover scenario occurs in a SteelFusion Core HA with NFS, the surviving Core transitions to the “ActiveSolo” state. However, this is where the situation differs from a Core HA deployment in block storage mode. In this condition with an NFS/file deployment, the surviving Core transitions all exported fileshares that are part of the HA configuration and projected to Edges, into a read-only mode. In our example, this would include A1, A2, and B1. It is important to understand that the “read-only mode” transition is from the perspective of the surviving Core and any Edges that are connected to the HA pair. There is no change made to the state of the exported fileshares on the backend NFS file servers. They remain in read-write mode.

SteelFusion Design Guide 191 SteelFusion and NFS Overview of high availability with NFS

With the surviving Core in ActiveSolo state and the NFS exports in read-only mode, the following scenarios apply:

 The ActiveSolo Core will defer all commits arriving from its connected Edges.

 The ActiveSolo Core will defer all shapshot requests coming in from its connected Edges.

 Edges connected to the ActiveSolo Core will absorb writes locally in the blockstore and acknowledge, but commits will be paused.

 Edges connected to the ActiveSolo Core will continue to service read requests locally if the data is resident in the blockstore, and will request nonresident data via the ActiveSolo Core as normal.

 Pinning and prepopulation of exported fileshares will continue to operate.

 Mounting new exported fileshares on the ActiveSolo Core from backend NFS file servers are permitted.

 Mapping exported fileshares from the ActiveSolo Core to Edge appliances will still be allowed.

 Any operation to offline an exported fileshare will be deferred.

 Any operation on backend NFS file servers to resize an exported fileshare will be deferred. Once the failed SteelFusion Core in an HA configuration comes back online and starts communicating healthy heartbeat messages to the ActiveSolo Core, recovery to normal service is automatic. Both Core appliances return to an ActiveSelf state and exported fileshares are transitioned back to read-write mode. All pending commits for the connected Edge appliances will be completed and any other deferred operations will resume and complete.

Note: In circumstances where it is absolutely necessary, it is possible to “force” a transition back to read-write mode while in an ActiveSolo state. Contact Riverbed Support for assistance.

If a Core that is part of an HA deployment needs to be replaced, see “Replacing a Core in an HA deployment” on page 198 for further guidance.

SteelFusion Edge high availability with NFS

Similar to Core HA deployments with NFS, Edge HA configuration with NFS also has some differences compared to Edge HA in a block storage scenario. However, we still recommend that you review “SteelFusion Appliance High-Availability Deployment” on page 81 to be familiar with Edge HA designs in general.

192 SteelFusion Design Guide Overview of high availability with NFS SteelFusion and NFS

This figure shows an example of a basic Edge HA design with NFS.

Figure 13-10. Ports used in Edge HA NFS/file deployment - external ESXi server

The basic example design shown in this figure is intended to illustrate the connectivity between the Edge peers (Edge A and Edge B). In this example, the ESXi server is located on the branch office LAN, external from the SteelFusion Edge appliances. In a production HA design, there will most likely be additional routing and switching equipment, but for simplicity in the diagram, this is not included. See also Figure 13-11 for an example of a basic design where the ESXi server is located within the Edge hypervisor node. In exactly the same way with Edge HA in a block storage deployment, best practice is to connect both Edge appliances using eth0_0 and eth0_1 network interfaces for heartbeat and blockstore sync. When configured as an HA pair, the Edges operate in an active-standby mode. Both Edge appliances are configured with the virtual IP (VIP) address. The underlying interface on each Edge (the example in Figure 13-10 is using the auxiliary port) must be configured with an IP on the same subnet. The ESXi server is configured with the VIP address for the NFS file server of the Edge HA pair. In their active-standby roles, it is the active Edge that responds via the VIP to NFS read/write requests from the ESXi server. Just as with a block storage HA deployment, it is also the active Edge that is communicating with the attached Core to send and receive data across the WAN link. The standby Edge takes no part in any communication other than to send/receive heartbeat messages and synchronous blockstore updates from the active Edge. In the event that the active Edge fails, the standby Edge takes over and begins responding to the ESXi server via the same VIP on its interface. For more details about the communication between Edge HA peers, see “SteelFusion Edge HA peer communication” on page 101.

SteelFusion Design Guide 193 SteelFusion and NFS Snapshots and backup

This figure shows another basic Edge HA design, but in this case, the ESXi server is located in the hypervisor node of the Edge. Remember that in an NFS/file deployment with Edge, communication between the NFS file server in the SteelFusion Edge RiOS node and the ESXi server in the hypervisor node is performed externally via suitable network interfaces.

Figure 13-11. Ports used in Edge HA NFS/file deployment - ESXi server in hypervisor node

In this example, the NFS file server is accessible via the VIP address configured on the auxiliary (Aux) port and connects to the hypervisor node using the gbe0_0 network interface port. The option to configure the VSP on both Edge appliances to provide an active-passive capability for any virtual machines (VMs) hosted on the hypervisor node is also possible with Edge NFS/file deployments. For more details about how to configure Edge HA, see the SteelFusion Edge Management Console User’s Guide.

Snapshots and backup With SteelFusion version 4.6, the features related to snapshot and backup were enhanced from a LUN- based approach to provide for server-level data protection. With NFS/file deployments using SteelFusion version 5.0 and later, this style of data protection supports branch ESXi servers including both the ESXi server in the hypervisor node of the Edge as well as external ESXi servers installed on the branch office LAN that use datastores mounted via NFS from the Edge. The feature is able to automatically detect virtual machines (VMs) that reside only on vSphere datastores remotely mounted from Edge via NFS. This means that any VM resident on local ESXi datastore, or local storage provided by Edge (not projected by Core), will be filtered out and automatically excluded from any Core backup policy. Server-level backup policies are dynamically updated when VMs are added to, or removed from, ESXi servers that are included in the policy. By default, the policies are designed to protect the VMs in a nonquiesced state. If backups are required on VMs that are quiesced, this can be enabled by editing the policy using a CLI command on the Core.

storage backup group modify quiesce-vm-list ,

194 SteelFusion Design Guide Snapshots and backup SteelFusion and NFS

In this command, is the name of the policy protection group identifier, and , are the VMs to be quiesced. For more details on this command, see SteelFusion Command-Line Interface Reference Manual.

Note: Although the server-level backup feature is designed to protect physical Windows servers, this is only supported with SteelFusion deployments in block storage mode.

As part of the entire SteelFusion data protection feature, snapshots performed under the control of a server-level backup policy configured on the Core are protected by mounting the snapshot on a proxy server and backing it up using a supported backup application. This action is also performed automatically as defined by the schedule configured in the backup policy. Remember, with a SteelFusion NFS/file deployment, backups are performed at the server level only. Individual VMs that are residing in the fileshare are exported from the Edge. Within an exported fileshare there may be a number of directories and subdirectories as part of the ESXi datastore, which may contain data other than the VMs themselves. This data is not included within the server-level backup policy defined on the Core. It is possible to manually trigger a snapshot of the entire fileshare export using the SteelFusion Edge Management Console. This will cause a crash-consistent snapshot of the relevant fileshare to be taken on the backend NFS file server located in the data center. To achieve this, the Core must already be configured with the relevant settings for the backend NFS file server so that it can complete the snapshot operation requested by the Edge. To specify these settings, in the SteelFusion Core Management Console choose Configure > NFS: Storage Arrays and select the Snapshot Configuration tab. An example screenshot is shown in Figure 13-12 showing the required details for the NFS file server.

SteelFusion Design Guide 195 SteelFusion and NFS Best practices

In the scenario where a manually triggered snapshot of the entire exported fileshare is taken on the backend NFS file server in the data center, because it is not part of any SteelFusion server-level backup policy, it is not automatically backed up. Therefore, additional third-party configuration, data management applications, scripts or tools, external to Core may be required to ensure a snapshot of the entire NFS-exported fileshare is successfully backed up. However, it is highly likely that such processes and applications are already available within the data center.

Figure 13-12. Snapshot configuration on the Core - NFS/file deployment

For a list of storage arrays supported in Core version 5.0, see “Existing SteelFusion features available with NFS/file deployments” on page 181. If you manually trigger snapshots by clicking Take Snapshot in the SteelFusion Edge Management Console and the procedure fails to complete, check the log entries on the Edge. This is especially relevant if the backend NFS file server is a NetApp. If there are error messages that include the text “Missing aggr-list. The aggregate must be assigned to the VServer for snapshots to work,” see this Knowledge Base article on the Riverbed Support site: https://supportkb.riverbed.com/support/index?page=content&id=S28732 This article will guide you through some configuration settings that are needed on the NFS file server. For more details, see the SteelFusion NetApp Solution Guide on the Riverbed Splash site. See the SteelFusion Core Management Console User’s Guide for more details on snapshot and backup configuration.

Best practices In general, you can apply the concepts in Chapter 12, “Deployment Best Practices” to SteelFusion deployments with NFS. There are some specific areas of guidance related to NFS and these are covered in the following sections:

196 SteelFusion Design Guide Best practices SteelFusion and NFS

 “Core with NFS - best practices” on page 197

 “Edge with NFS - best practices” on page 199

Core with NFS - best practices

This section describes best practices related to SteelFusion Core. It includes the following topics:

 “Network path redundancy” on page 197

 “Interface selection” on page 197

 “Editing files on exported fileshares” on page 197

 “Size of exported fileshares” on page 197

 “Resizing exported fileshares” on page 198

 “Replacing a Core” on page 198

 “Replacing a Core in an HA deployment” on page 198

Network path redundancy Core does not support Multipath input/output (MPIO). For redundancy between the Core and the backend NFS file server, consider using Edge Virtual IP (VIP) capability on the NFS file server interfaces if it is supported by the vendor.

Interface selection In a deployment where multiple network interfaces are configured on the Core to connect to backend NFS file servers, there is no interface selection for the NFS traffic. Any interface that is able to find a route to the NFS file server could be used. If you must use specific interfaces on the Core to connect to specific NFS file server ports, then consider using static routes on the Core to reach the specific NFS file server IP addresses.

Editing files on exported fileshares Do not edit files on the exported fileshares by accessing them directly, unless the fileshare has already been unmounted from the Core. Where possible, use controls to restrict direct access to exported files on the file server. Also ensure that only the Core has write permission on the exported fileshares. This can be achieved by using IP or hostname based access control on the NFS file server configuration settings for the exports. If necessary, exported fileshares can be mounted read-only by other hosts.

Size of exported fileshares Exported fileshares up to 16 TiB (per export) are supported on the Core. However, the number of mounted exports that are supported will depend on the model of Core.

Note: EMC Isilon only - if you are provisioning and exporting NFS storage on an Isilon storage system, we recommend that you set a storage quota for the directory so the export does not exceed 16 TiB. For details, see “Deploying SteelFusion with Dell EMC Isilon for NFS” on the Riverbed Splash site.

SteelFusion Design Guide 197 SteelFusion and NFS Best practices

Resizing exported fileshares Exported fileshares can be expanded on the backend NFS file server without first unmounting them from the Core. The resized export will be detected by the Core automatically within approximately one minute. Reduction of exported fileshares is not supported.

Replacing a Core Replacing a Core as part of an RMA operation will require an extra configuration task if the Core is part of an NFS/file deployment. By default, a replacement Core will be shipped with a standard software image that would normally operate in block storage mode. As part of the procedure to replace a Core, if necessary, the Core software should first be updated to a version of supporting NFS (minimum version 5.0). If the Core needs to be upgraded to a version that supports NFS, make sure both image partitions are updated so the Core isn’t accidentally rebooted to a version that doesn’t support NFS/file mode. Once the required version of software is installed, the Core needs to be instructed to operate in NFS/ file mode. This is achieved by entering the following two commands using the Core command line interface:

service reset set-mode-file service reset set-mode-file confirm The first command will automatically respond back with a request for confirmation, at which point, you can enter the second command.

Note: This operation will clear any storage configuration on the Core - therefore care must be taken to ensure the command is performed on the correct Core appliance.

Once the Core is configured for NFS operations, it can be configured with the relevant storage settings.

Replacing a Core in an HA deployment If a Core that is part of an HA deployment needs to be replaced, use the following sequence of steps as a guideline to help with replacement of the Core:

 Consider opening a ticket with Riverbed Support. If you are not familiar with the replacement process, Riverbed Support can guide you safely through the required tasks.

 Stop the service on the Core to be replaced. This will cause the peer Core to transition to ActiveSolo state and take control of fileshares that had been serviced by the Core to be replaced. The ActiveSolo Core will change all the fileshares to read-only mode for the Edges.

 If the replacement Core is expected to take an extended period of time (days) before it is installed, contact Riverbed Support for guidance on forcing the filesystems back to a read-write mode.

 Once the replacement Core is installed and configured to operate in NFS/file mode, it can be updated with the relevant storage configuration settings by sending them from the ActiveSolo Core. This is achieved by performing the following command on the ActiveSolo Core:

device-failover peer set local-if force In this command, is the IP address of the replacement Core appliance and is the local interface of the ActiveSolo Core used to connect to the replacement Core IP address. For more details on this command, see the SteelFusion Command-Line Interface Reference Manual.

Note: Ensure that you enter commands on the correct Core appliance.

198 SteelFusion Design Guide Best practices SteelFusion and NFS

Edge with NFS - best practices

This section describes best practices related to SteelFusion Edge. It includes the following topics:

 “General purpose file server” on page 199

 “Alternative ports for VIP address” on page 199

 “Unmounting ESXi datastores” on page 199

 “Sparse files” on page 199

 “Changing Edge VIP address” on page 199

 “Replacing an Edge” on page 200

General purpose file server SteelFusion Edge with NFS is not designed to be a general purpose NFS file server and is not supported in this role.

Alternative ports for VIP address If additional nonbypass ports are available in the RiOS node due to the installation of an optional network interface card (NIC), the VIP address can be assigned on those ports.

Unmounting ESXi datastores Before unmapping or taking offline any NFS exports on the Edge, ensure that the ESXi datastore that corresponds to the exported fileshare has been unmounted.

Sparse files Exported fileshares that contain sparse files can be mounted by the Core and projected to the Edge. However, SteelFusion does not maintain the sparseness and, therefore, pinning the exported fileshare may not be possible. If a sparse file is created through the Edge, it will be correctly synchronized back to the backend NFS file server. But if the export that contains the sparse file is off-lined or unmounted, the sparseness is not maintained following a subsequent on-line or mount operation.

Changing Edge VIP address If you need to change the VIP address on the Edge appliance, you may need to remount the datastores. In this scenario, a VIP address change will also affect the UUID generated by the ESXi server. For more details on additional administration steps required, see the Knowledge Base article on the Riverbed Support site: https://supportkb.riverbed.com/support/index?page=content&id=S30235.

SteelFusion Design Guide 199 SteelFusion and NFS Best practices

Replacing an Edge Replacing an Edge as part of an RMA operation will require an extra configuration task if the Edge is part of an NFS/file deployment. By default, a replacement Edge will be shipped with a standard software image that would normally operate in block storage mode. As part of the procedure to replace an Edge, if necessary, the Edge software should first be updated to a version supporting NFS (minimum version 5.0). If the Edge needs to be upgraded to a version that supports NFS, make sure both image partitions are updated so the Edge isn’t accidentally rebooted to a version that doesn’t support NFS/file mode. Once the required version of software is installed, the Edge needs to be configured to operate in NFS/ file mode. To carry out this task, you will require the guidance of Riverbed Support. Ensure that you open a case with Riverbed. Once the Edge is configured for NFS operations, it can be configured with the relevant storage settings.

Related information

 SteelFusion Core Management Console User’s Guide

 SteelFusion Edge Management Console User’s Guide

 SteelFusion Core Installation and Configuration Guide

 SteelFusion Edge Installation and Configuration Guide

 SteelFusion Command-Line Interface Reference Manual

 Riverbed Splash site at https://splash.riverbed.com/community/product-lines/steelfusion

200 SteelFusion Design Guide 14

SteelFusion Appliance Sizing

Every deployment of the SteelFusion product family differs due to variations in specific customer needs and types and sizes of IT infrastructure. The following information is intended to guide you to achieving optimal performance. However, these guidelines are general; for detailed worksheets for proper sizing, contact your Riverbed representative. This chapter includes the following sections:

 “General sizing considerations” on page 201

 “Core sizing guidelines” on page 201

 “Edge sizing guidelines” on page 203

General sizing considerations Accurate sizing typically requires a discussion between Riverbed representatives and your server, storage, and application administrators. General considerations include but are not limited to:

 Storage capacity used by branch offices - How much capacity is currently used, or expected to be used, by the branch office. The total capacity might include the amount of used and free space.

 Input/output operations per second (IOPS) - What are the number and types of drives being used? This value should be determined early so that the SteelFusion-enabled SteelHead can provide the same or higher level of performance.

 Daily rate of change - How much data is the Edge expected to write back to the storage array through the Core? This value can be determined by studying backup logs.

 Branch applications - Which and how many applications are required to continue running during a WAN outage? This answer can impact disk capacity calculations.

Core sizing guidelines The main considerations for sizing your Core deployment are as follows:

 Total data set size - The total space used across LUNs (not the size of LUNs).

 Total number of LUNs - Each LUN adds five optimized connections to the SteelHead. Also, each branch office in which you have deployed Edge represents at least one LUN in the storage array.

SteelFusion Design Guide 201 SteelFusion Appliance Sizing Core sizing guidelines

 RAM requirements - You should have at least 700 MB of RAM per terabyte (TB) of used space in the data set. There is no specific setting on the Core to allocate memory on this basis, but in general this amount is how much the Core uses under normal circumstances if the memory is available. Each Core model has a fixed capacity of memory (see SteelFusion and SteelHead specification sheets for details) that it is shipped with. If the metric falls below the recommended value, performance of the Core can be affected, and the ability to efficiently perform prediction and prefetch operations. Other potentially decisive factors include:

 Number of files and directories

 Type of file system, such as NTFS or VMFS

 File fragmentation

 Active working set of LUNs

 Number of misses seen from Edge

 Response time of the storage array This table summarizes sizing recommendations for Core appliances based on the number of branches and data set sizes.

Model Number of LUNs Number of branches Data set size RAM

1000U 10 5 2 TB VM guidelines 1000L 20 10 5 TB VM guidelines 1000M 40 20 10 TB VM guidelines 1500L 60 30 20 TB VM guidelines 1500M 60 30 35 TB VM guidelines 2000L 20 10 5 TB 24 GB 2000M 40 20 10 TB 24 GB

2000H 80 40 20 TB 24 GB 2000VH 160 80 35 TB 24 GB 3000L 200 100 50 TB 128 GB

3000M 250 125 75 TB 128 GB 3000H 300 150 100 TB 128 GB 3500C1 150 75 25 TB 256 GB 3500C2 200 100 50 TB 256 GB 3500C3 300 150 100 TB 256 GB

Note: Core models 1000 and 1500 are virtual appliances. For minimum memory requirements, see the SteelFusion Core Installation and Configuration Guide.

The above table assumes 2 x LUNs per branch; however, in SteelFusion 4.6 and earlier, there is no enforced limit for the number of LUNs per branch or number of branches, so long as the recommended number of LUNs and data set sizes are within limits.

202 SteelFusion Design Guide Edge sizing guidelines SteelFusion Appliance Sizing

SteelFusion Core version 5.0 and later includes support for enforcement of Core specifications. This enforcement is designed to ensure the SteelFusion deployment performs to its optimum. When upgrading to Core version 5.0, or later, from a 4.x release, the Core will raise an alarm warning the administrator if the specifications have already been exceeded by a configuration that was applied prior to upgrading. However, the SteelFusion deployment will continue to operate normally. In Core version 5.0 or later, if you are applying configuration changes where additional LUNs or Edges are in the process of being added, the Core will prevent the operation from being performed if the specification limits will be exceeded. The administrator will receive a warning message describing the problem. Figure 14-1 shows two example messages. One for exceeding the maximum number of LUNs and one for exceeding the number of Edges.

Figure 14-1. Example warning messages

The specification enforcement does not apply to local LUNs. If an existing LUN is resized to a capacity that will exceed the specification limit for the Core, it will be allowed, but an alarm will be raised. When a failover occurs in a Core HA deployment, the ActiveSolo Core will take control of serving the failed Core’s LUNs to the Edges it was connected to. This is normal behavior. Depending on the configuration in this scenario, if the Cores are running software version 5.0 or later, the ActiveSolo Core could be in a situation where the total number of LUNs and/or Edges exceed the specification. This is due to it taking over from the failed Core. If the specification limits are exceeded, an alarm will be raised on the ActiveSolo Core but operations will continue. The alarm will clear automatically once the failed Core is recovered and back online. There is currently no enforcement of specifications for Core appliances deployed in NFS/file mode and currently no enforcement for Edge devices deployed in block storage mode or NFS/file mode.

Edge sizing guidelines The main considerations for sizing your Edge deployment are as follows:

 Disk size - What is the expected capacity of the Edge blockstore? – Your calculations can be affected depending on if LUNs are pinned, unpinned, or local.

SteelFusion Design Guide 203 SteelFusion Appliance Sizing Edge sizing guidelines

– During WAN outages when the Edge cannot synchronize write operations back through the Core to the LUNs in the data center, the Edge uses a write reserve area on the blockstore to store the data. As described in “Pin the LUN and prepopulate the blockstore” on page 162, this area is 10 percent of the blockstore capacity.

 Input/output operations per second (IOPS) - If you are replacing existing storage in the branch office, you can calculate this value from the number and types of drives in the devices you want to replace. Remember that the drives might not have been operating at their full performance capacity. So, if an accurate figure is required, consider using performance monitoring tools that might be included within the server OS: for example, perfmon.exe in Windows. Other potentially decisive factors:

 HA requirements (PSU, disk, network redundancy)

 VSP CPU and memory requirements

 WAN optimization requirements (bandwidth and connection count) See the SteelFusion Edge specification sheets for capacity and capabilities of each model. When sizing the Edge appliance, you have the additional flexibility of choosing optimized WAN bandwidth (W0-W3) needed after you have completed sizing for disk size, IOPS, and CPU. The W0 model comes with 0 optimized WAN bandwidth for external traffic, but it still optimizes SteelFusion traffic. For more information, see “Rdisk traffic routing options” on page 165.

204 SteelFusion Design Guide A

Edge Network Reference Architecture

This appendix includes the following topic:

 “Edge network card interfaces” on page 205

Edge network card interfaces For deployments with Edge appliances, the first PCIe slot is always populated with a 4-port bypass network interface card (NIC) as standard. This NIC is used by the RiOS node as two in-path interfaces (inpath1_0 and inpath1_1) for WAN optimization. Depending on the Edge appliance model, there can be a further 1 to 5 PCIe slots available for additional multiple port NICs. Any additional multiple port NICs that you purchase and install must be specified for bypass or data at the time of ordering. The following table summarizes the deployment options for Edge models.

SteelFusion Number of Slot 1 (always Slot 2 (RiOS node) Slot 4 and 5 (RiOS Slot 5 and 6 Edge model PCIe populated node) (hypervisor expansion prior to node) slots shipment)

SFED2100 2 in-path in-path or storage — — SFED2200 2 in-path in-path or storage — — SFE32100 6 in-path in-path or storage in-path or storage data SFED3200 6 in-path in-path or storage in-path or storage data

SFED5100 6 in-path in-path or storage in-path or storage data In which:

 in-path - bypass NIC used by the RiOS node for WAN optimization traffic

 storage - multiple port data NIC used by the RiOS node, enabling iSCSI access for storage traffic to and from servers external to the Edge, or as additional heartbeat interfaces with Edge HA deployment

 data - multiple port data NIC used by the hypervisor node, enabling LAN access for hypervisor node traffic

Note: You can use a multiple port data NIC for storage or data. It just depends on the slot the NIC is installed in.

SteelFusion Design Guide 205 Edge Network Reference Architecture Edge network card interfaces

206 SteelFusion Design Guide