EBR APPLIANCE­—RESTORE FROM CLONE

Alessio Casagrande System Engineer Lutech S.P.A Table of Contents

Preface ______3

Physical to Virtual ______4

EMC Backup and Recovery Appliance ______5

Backup and DR Environment ______7

Restore a clone saveset ______11 Test EMC Backup and Recovery appliance restore ______12 Test 1 ______13 Test 2 ______14 Workaround for this bug ______15

Conclusion ______17

Bibliography ______18

Disclaimer: The views, processes or methodologies published in this article are those of the author. They do not necessarily reflect EMC Corporation’s views, processes or methodologies.

2015 EMC Proven Professional Knowledge Sharing 2

Preface This article explains the workaround found to bypass a problem in EMC Backup and Recovery appliance. We have implemented the backup and recovery environments in an enterprise customer. For security reasons, in this article I have modified all IP address and hostname for every component for primary and disaster recovery site.

2015 EMC Proven Professional Knowledge Sharing 3

Physical to Virtual Today we are experiencing significant data growth. As the volume of applications increase, servers now must be more flexible to manage the increase in data. Over this time, many companies changed their environment, introducing a virtual environment to reduce hardware cost. The virtual environment lessens hardware use and reduces management cost. Virtual environments reduce complexity of your environment by enabling you to consolidate more servers onto fewer physical machines.

A consolidated environment means a smaller network and application infrastructure. Consequently, businesses need less hardware, including costly items like servers, routers, and other supplies. Decreasing the number of servers likely means standardizing on a few critical software applications, allowing companies to reduce operations costs.

However, the most considerable cost reducing benefit of unification is the decreased burden on IT and operations personnel. With less activity at remote locations, management and communications requirements for those sites drop dramatically. Additionally, the virtual environment introduced new approaches for backup. EMC NetWorker® allows a virtual server backup which reduces backup windows. To allow a full image backup of a machine EMC developed an EMC Backup and Recovery appliance (EBR Appliance). [1]

Figure 1 Difference between physical and virtual environment

2015 EMC Proven Professional Knowledge Sharing 4

EMC Backup and Recovery Appliance NetWorker® provides VMware protection through an EMC Backup and Recovery appliance that, when deployed and configured, allows you to set up backup policies in NetWorker Management Console (NMC), and then assign VMs to those backup policies by using the EMC Backup and Recovery plug-in within the vSphere Web Client. [2]

Figure 2 EMC Backup and Recovery plug-in [3]

The EMC Backup and Recovery appliance supports backup over EMC Data Domain®. This can be improved with EMC Data Domain Boost and deduplication to reduce the backup windows and improve performance. The appliance is integrated on a VMware environment because it is a virtual machine. The appliances communicate with vCenter host and NetWorker Server to allow the backup of virtual machines. To configure it you can deploy a simple .OVA inside your environment and with NetWorker Management Console you can create more VMware policies to perform a scheduled backup or a manual backup of a virtual machine. The appliance is comprised of 8 virtual proxies inside to allow the simultaneous backup of 8 virtual machines. You can add new proxies with an EBR Proxies virtual machine. You can register it on EMC

2015 EMC Proven Professional Knowledge Sharing 5 backup and recovery appliance and improve the number of session up to 25. There is a limit of one EMC backup and recovery appliance and 3 EBR Proxies. For every EBR Proxies virtual machine, you have 8 proxies.

The EMC Backup and Recovery appliance perform a backup using primary option hotadd. With hotadd the backup-related I/O happens internally through the ESX I/O stack using SCSI hot-add technology. With Hotadd, backup I/O rates improve but you use more CPU, memory, and I/O load on your ESX server where EMC Backup and Recovery appliance is hosted. Hotadd mode requires that the ESX hosting the virtual proxy should have access to all the data stores where the VMs are hosted. You can have different type of data store SAN, iSCSI, and NFS. So, if the data stores are SAN/iSCSI/NFS and if the ESX server where the VADP proxy resides is separate from the ESX server where the VMs are hosted, then:

• In the case of SAN LUNs, the ESX hosting the proxy and the ESX hosting the VMs should be part of the same fabric zones.

• In the case of iSCSI LUNs, the ESX hosting the proxy and the ESX hosting the VMs should be configured for the same iSCSI-based storage targets.

• In the case of NFS data stores, the ESX hosting the proxy and the ESX hosting the VMs should be configured for the same NFS mount points.

If the hotadd mode is not possible the appliance uses NBD mode. In this mode, the CPU, memory, and I/O load gets directly placed on the ESX hosting the production VMs, since the data backup has to move through the same ESX and reach the proxy over the network. The EMC Backup and Recovery appliance allow the clone backup. From NetWorker Management Console you can configure a clone session of your backup inside another device. In our environment, we have used two EMC Data Domain systems. The EMC Backup and Recovery appliance allow a full image restore of your virtual machine. You can restore the virtual machine into a different data store inside your VMware environment. The appliance allows the restore from a clone saveset. This feature is valuable because you can locate a backup into two different backup storage located in different places. A periodical clone session can be scheduled to protect your data to copy a backup into a secondary storage. The EMC Backup and Recovery appliance enables you to perform a File-level restore for and Windows virtual machines. You can restore a directory or file into the same virtual machine or into a different virtual machine.

2015 EMC Proven Professional Knowledge Sharing 6

Backup and DR Environment The backup environment that I have configured is composed in two sites. The primary site is composed to a single vCenter with two clusters. For every cluster, we have four ESX servers where more than 200 virtual machines for production cluster and 20 virtual machines for management cluster are located.

In the following picture you can see the configuration for primary site has configured a virtual NetWorker server:

Figure 3: Primary site Backup environment

2015 EMC Proven Professional Knowledge Sharing 7

OS Red Hat Enterprise 6.5 (Santiago)

VCpu 4 vCpu

Memory 24 GB

Disk for OS 20GB

Disk for data 100GB

NetWorker version 8.1.1 SP1

The NetWorker server has two NICs, one for production network to allow backup via agent of all server machine and one NIC for management network to allow backup via EMC Backup and Recovery appliance. The backups are performed with Data Domain Boost on a Data Domain DD4200. In the Primary site we have deployed one EMC Backup and Recovery appliance and one EBR Proxies machine to perform backup with 16 sessions. In the primary sites we perform 40TB of backup. From NetWorker Management Console, we have configured more than 10 VMware protection policies. For every policy, we have scheduled about 20 virtual machines because for every policy there is a limitation of 25 clients. For 6 VMware protection policies, we have configured a clone session to copy data into a secondary Data Domain DD2500 located in a secondary site. The EMC Backup and Recovery appliance and NetWorker server can see the saveset located into both Data Domain and users can restore the saveset from primary backup or clone.

In the secondary environment there is only the Data Domain DD2550. In case of primary site disaster, we have a Disaster and Recovery plan to restore all virtual machines with NetWorker and EMC Backup and Recovery appliance into an identical second VMware environment. The EMC Backup and Recovery appliance version used in our environment is 1.0.1.9. In case of disaster of NetWorker server, we can restore the service into a secondary site because the NetWorker server virtual machine is replicated with Site Recovery Manager. For EMC Backup and Recovery appliance, we redeployed the .OVA into a secondary site and restored the checkpoint cloned into secondary site [4]. We have scheduled a VMware protection policy into NetWorker Management Console to allow backup and clone of checkpoint. When a disaster

2015 EMC Proven Professional Knowledge Sharing 8 occurred, we would restore all virtual machines using NetWorker and EMC Backup and Recovery appliance from Data Domain DD2500 located in the secondary site.

Site Recovery Manager works with two different vCenter. in our environments we have a primary vCenter (primaryvcenter01) and in DR site a second vCenter (secondaryvcenter01). This can cause more problems because the two sites are two different VMware environments. To allow a correct replica, the two sites are in active-active configuration, with two different vCenter. This can cause problems with our appliance, because EMC Backup and Recovery appliance can work with only a vCenter and if you want to restore your virtual machine into a different VMware environment, the appliance must be registered on the same vCenter (same hostname of primary vcenter). To bypass this problem it’s necessary that on secondary site the EMC Backup and Recovery appliance resolved the secondary vCenter with the same hostname of primary. A complete DNS resolution forward and reverse is necessary to permit a correct registration.

When a disaster occurred in the DR site, we restarted the NetWorker services and to resolve the vCenter resolution I have configured the dnsmasq into NetWorker server machine. The NetWorker server can operate as a DNS server for EMC Backup and Recovery appliance in DR site. Into /etc/hosts of NetWorker server we have inserted an entry for DR vCenter with a new IP address but old hostname. In the /etc/dnsmasq.conf we have configured the reverse resolution and with this simple trick we can have a perfect resolution of secondary vCenter to allow a correct registration for EMC Backup and Recovery appliance [5]. When we redeployed the .OVA in the secondary environment, we used NetWorker Server as DNS server only for EMC Backup and Recovery appliance. If you use this method, it’s necessary that the appliance can resolve the vCenter, all ESX servers, and secondary Data Domain. It is not necessary to resolve the primary Data Domain because we are in a disaster recovery environment and the primary Data Domain is not reachable. Into /etc/hosts of Networker server we have added this entry:

2015 EMC Proven Professional Knowledge Sharing 9

172.29.10.15 primaryvcenter01.mydomain.com primaryvcenter01

#172.29.10.15 is ip address for secondaryvcenter01

172.29.10.21 dresx01.mydomain.com dresx01

172.29.10.22 dresx02.mydomain.com dresx02

172.29.10.23 dresx03.mydomain.com dresx03

172.29.10.24 dresx04.mydomain.com dresx04

172.29.10.165 ddsecondary01.mydomain.com ddsecondary01 172.29.10.166 ddsecondary01.mydomain.com ddsecondary01

In the following table you can see the network configuration for the EMC Backup and Recovery appliance.

IPv4 static address: 172.29.0.170

Netmask: 255.255.252.0

Gateway: 172.29.0.1

Primary DNS: 172.29.0.163

Hostname: ebrappliance

Domain: mydomain.com

With dnsmasq, we are able to configure the EMC Backup and Recovery appliance into the secondary VMware environment. After initial configuration, we can restore the checkpoint and restore it with NetWorker. mminfo -q”client=ebrappliance.mydomain.com,volume=ddsecondaryclone.001,name=cp.xxxxxxx” –xc/ - r”name,ssid,cloneid” –t”today” recover –s bckserver01.mydomain.com –S “ssid/cloneid”

2015 EMC Proven Professional Knowledge Sharing 10

After a NetWorker restore, we can execute the checkpoint rollback with the script ebr-rollback- util.sh. This restore allows the EMC Backup and Recovery appliance to see all savesets for every machine.

Restore a clone saveset The EMC Backup and Recovery appliance allows performing a restore from clone saveset. The appliance identifies a primary saveset as Primary and a cloned saveset as Replica. If you would like to perform a backup from clone, you can select a virtual machine and select the last replica saveset. If you try to restore this saveset, the system displays the following error:

Figure 4: Error message during a restore from clone

I have analyzed the communication of EMC Backup and Recovery appliance during a restore. With tcpdump, I see that the appliance attempts to contact the primary Data Domain regardless of saveset selected. We have performed three tests to analyze this bug.

2015 EMC Proven Professional Knowledge Sharing 11

Test EMC Backup and Recovery appliance restore We have tried to perform two possible tests to analyze this potential bug. In the first test, we tried to restore a clone saveset in production environment. In this case the EMC Backup and Recovery appliance can communicate with primary (ddprimary01) and secondary Data Domain (ddsecondary01). In the second test, we have shut down the primary Data Domain and tried to restore a clone saveset from secondary Data Domain.

Figure 5: List of primary and secondary saveset for machine01

2015 EMC Proven Professional Knowledge Sharing 12

Test 1 We have executed a restore from clone, located on secondary Data Domain (ddsecondary01), with the device of primary Data Domain (ddprimary01) enabled. The IP address of primary Data Domain is 172.29.0.165 and 172.29.0.166. In this test, we have seen that the EMC Backup and Recovery appliance contacts the primary Data Domain and the restore fine but the appliance takes the saveset into primary Data Domain and not into secondary Data Domain.

13:39:30.405814 IP 172.29.0.166.2049 > 172.29.0.170.0: reply ok 0

13:39:30.405822 IP 172.29.0.170.50905 > 172.29.0.166.2049: .ack 171794249 win 23

13:39:30.406124 IP 172.29.0.170.1394923407 > 172.29.0.166.2049: 92 proc-38

13:39:30.406297 IP 172.29.0.166.2049 > 172.29.0.170.50905: .ack 92 win 23

13:39:30.406459 IP 172.29.0.166.2049 > 172.29.0.170.1394923407: reply ok 580 proc- 38

13:39:30.406465 IP 172.29.0.170.50905 > 172.29.0.166.2049: .ack 581 win 28

13:39:30.406976 IP 172.29.0.170.1378146191 > 172.29.0.166.2049: 704 proc-51

This is a tcpdump for this test. The appliance can try to contact the primary Data Domain and perform the restore from it. From active connection of primary Data Domain we can see the active connection:

2015 EMC Proven Professional Knowledge Sharing 13

Figure 6: Restore job from primary Data Domain

Test 2 In the last test, we tried to select the clone saveset located on secondary Data Domain system and shut down the primary Data Domain. When the restore tries to start, the EMC Backup and Recovery appliance can try to contact the primary Data Domain system but the communication failed because the system is down. This causes the failure of restore.

An unexpected connection error occurred and the cause could not be determined. Please check your EBR configuration screen to troubleshoot, or contact an administrator.

We see that the EMC Backup and Recovery appliance cannot perform a restore from secondary Data Domain.

This is wrong, because the appliance cannot communicate with primary Data Domain if the restore is from clone saveset located on secondary Data Domain.

2015 EMC Proven Professional Knowledge Sharing 14

Workaround for this bug But why can the EMC Backup and Recovery appliance try to contact the primary Data Domain during a clone restore?

NetWorker identifies every saveset after a backup or clone job with two different IDs; SSID, and Clone ID. The primary and every clone related of primary saveset have the same SSID. The difference between primary and clone saveset is CloneID. The primary and clone saveset have different CloneID. In the following example, you see that a primary and secondary saveset have the same SSID(2302247051) and different CloneID(1413054609 for primary and 1413055828 for clone) [6]. mminfo -q"client=rbback05.dc.repower.com,ssid=2302247051" -xc/ -r ssid,cloneid,volume

You see a similar output ssid/clone-id/volume

2302247051/1413055828/secondarydd01clone.001

2302247051/1413054609/primarydd01Mgmt.001

I have seen that the EMC Backup and Recovery appliance work only with SSID and not consider the CloneID. This bug when the primary Data Domain is down causes failure of all restores from clone, because the appliance can try to restore the primary saveset (2302247051/1413054609) and not the clone. All versions of EMC Backup and Recovery appliance have this bug. To bypass this problem, I have found a simple and fast workaround to perform a restore from clone. Suppose that we want to restore a clone saveset for a virtual machine. The example we have selected is the saveset (2302247051/1413055828). With a mminfo, we can see all copy of this (primary and secondary) [6].

2302247051/1413055828/secondarydd01clone.001

2302247051/1413054609/primarydd01Mgmt.001

If we delete all primary saveset with same SSID of clone saveset, the appliance can see a unique SSID for clone saveset. From NetWorker server, we can delete the primary saveset with

2015 EMC Proven Professional Knowledge Sharing 15 nsrmm command. After you delete all primary saveset for a machine and run a cross-check with nsrim, you can see only a clone saveset: nsrmm –v –d –y –S ssid/cloneid nsrim –X

After these commands, the EMC Backup and Recovery can see only a saveset with unique SSID. If we select this and try to restore the clone saveset the appliance sees that the SSID is located on secondary Data Domain and the appliance can perform the restore from secondary Data Domain without a problem.

Figure 7: Restore job in secondary VMware environment

In this picture we can see the restore job from vSphere client. This workaround now is the unique method to allow a correct restore from clone for the EMC Backup and Recovery appliance. In the Disaster Recovery environment, we have restored 200 machines with this workaround. I have seen that this bug is present in the last release of NetWorker 8.2 and this workaround is the unique method to bypass this problem.

2015 EMC Proven Professional Knowledge Sharing 16

Conclusion This article explains the bug present in all versions of EMC Backup and Recovery appliance. We have installed the appliance into an enterprise environment. In our environment we have a large number of machines and this workaround is fundamental to perform a clone restore. But with more machines, the steps to delete saveset can cause loss of time. For every saveset you can lose 10 seconds for each deleted primary copy. We have reduced this delay with a script that deletes the last primary copy. In this case, we are able to delete all primary saveset for all machines in a specific date in less time. The workaround found is a simple way to bypass this important bug.

2015 EMC Proven Professional Knowledge Sharing 17

Bibliography [1] EMC2 – “Backup VMware con EMC NetWorker”

[2] EMC2 – “VMware protection using VBA with NetWorker 8.1”

[3] VMware: https://communities.vmware.com

[4] EMC2 – “EMC NetWorker and VMware Reelease 8.1 SP1 – Integration Guide”

[5] Red Hat – “Red Hat Enterprise Linux 6 – 6.4 Technical Notes”

[6] EMC2 – “EMC NetWorker Release 8.1 Service Pack 1 – Administration Guide”

EMC believes the information in this publication is accurate as of its publication date. The information is subject to change without notice.

THE INFORMATION IN THIS PUBLICATION IS PROVIDED “AS IS.” EMC CORPORATION MAKES NO RESPRESENTATIONS OR WARRANTIES OF ANY KIND WITH RESPECT TO THE INFORMATION IN THIS PUBLICATION, AND SPECIFICALLY DISCLAIMS IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

Use, copying, and distribution of any EMC software described in this publication requires an applicable software license.

2015 EMC Proven Professional Knowledge Sharing 18