BACKUP OPTIMIZATION ‘NETWORKER INSIDE’

Shareef Bassiouny Mohamed Sohail EMC EMC

Giovanni Gobbo Senior IT Consultant Table of Contents

Executive summary ...... 3

Introduction ...... 4

Part 1 ...... 5

How much Data Storage could be gained? How could it be maximized? ...... 7

What is the penalty of this gain? ...... 8

Classic design example ...... 8

Advantages/disadvantages of the new DD Boost over Fibre Channel (DFC) ...... 15

Part II ...... 16

Journey to an optimized backup environment ...... 16

The Journey ...... 18

Steps to the solution ...... 23

NetWorker ...... 23

Data Domain ...... 25

Avamar ...... 29

“Virtualized Environments” ...... 31

Appendix ...... 34

Biography ...... 35

Disclaimer: The views, processes, or methodologies published in this article are those of the authors. They do not necessarily reflect EMC Corporation’s views, processes, or methodologies.

2014 EMC Proven Professional Knowledge Sharing 2

Executive summary Do you need to speed up your back up by up to 50%? Do you need to reduce the use of your bandwidth up to 99%? Do you want to reduce the backup server workload up to 40%? Do you want to increase your backup success rate?

The answer? Data Domain® Boost (DD Boost) which enables you to finish backups within backup windows and provide breathing room for data growth. With performance up to 31 TB/hr, it is 3 times faster than any other solution, enabling you to use your existing network infrastructure more efficiently.

In this Knowledge Sharing article we illustrate how we optimized our backup processes and leveraged current resources by integrating NetWorker® backup management software and the new DD Boost over Fiber Channel feature to enhance backup system performance.

The major component of EMC backup and recovery software solutions, NetWorker is a cornerstone element in the backup solutions of large infrastructure customers. This article targets backup administrators, support engineers, and stakeholders interested in the importance of the DD Boost over Fiber Channel feature and how to use it to enhance backup success rate. The goal of this article is to help you:

 speed up backups  avoid congestion that slows down large critical backups through bandwidth utilization reduction  minimize workloads on backup hosts (NetWorker server and Storage nodes)

2014 EMC Proven Professional Knowledge Sharing 3

Introduction In Part 1, we follow a dialogue we had with a customer while promoting Data Domain for his backup environment, which led us to promote NetWorker as one of the best integrated products with Data Domain appliances. Part 1 is a series of questions and answers that try to discover why and how, while we were trying to concentrate on the basic concepts and leave the details to the referenced documents, primarily “NetWorker and Data Domain Devices Integration Guide” version 8.1.

Part 2 is the final output of the customer conversation from part 1 coupled with the data that we had from the customer requirements documents. We then produced a solution proposal that relied on the concepts we had built in Part 1, along with details on how those products fit into the customer environment.

2014 EMC Proven Professional Knowledge Sharing 4

Part 1 While deduplicated storage as a backup media target is not a new concept in Backup and Recovery Solutions architecture (BRSa), the technology used for this deduplicated storage is one of the major factors that affects backup performance and success rates.

A well-known example is EMC DL3D which integrated multiple storage technologies to achieve Backup to Disk (B2D) performance through a Virtual Tape Library interface, coupled with a backend storage deduplication. However, since the deduplication process was running offline, appliance performance was known to deteriorate beyond 70-80% disk utilization.

Data Domain emerged as cutting-edge technology for deduplicated storage solutions targeting backup solutions as a backup to disk storage. Its “in-line” deduplication technology (data is deduplicated before being written to disk, as soon as it reached the storage host), and high performance made it one of the best-selling products in the EMC Data Protection and Availability Delivery portfolio. Perhaps the main reason for its market appeal is the sustainable performance that it delivers (minimal performance degradation beyond 95% utilization) and the diverse storage connectivity options it provided. Further integration with backup solutions led to DD Boost, one of the most interesting features provided with Data Domain appliances.

DD Boost is comprised of Distributed Segment Processing (DSP) coupled with DD API. DSP is a mechanism that enables client-side deduplication to be integrated into virtually any application that wants to dump data to a secondary storage backup media. DD API is the Data Domain programming interface that enables applications/hosts to communicate with DD Operating system (DDOS) in a way that leverages this integration interface to provide more features and facilities to “boost” performance, minimize backup widow and bandwidth utilization, and enhance backup success rates.

Basic concepts mentioned in the following discussion include:

Brief Blueprint on Deduplication Technologies Deduplication and compression have the same aim; to remove redundancies from the data patterns. While compression scope is file or an archive of files, deduplication scope is a File System used to store backup data, also called Storage Unit (SU) in Data Domain jargon. Here, we are not talking about file level deduplication (which hashes the contents of every file on the file system and thus detects duplicate content and removes the duplicate copies, replacing them by stub-pointers to the original content).

2014 EMC Proven Professional Knowledge Sharing 5

Figure 1: File-based deduplication

We are talking about sub-file deduplication technology which segments every file using a certain segmentation algorithm—the most efficient have been found to be variable length segmentation—into chunks. It is those chunks that are identified by their hash fingerprints, so if a duplicate chunk is found it is replaced by a pointer to the original chunk (the first one found to be unique). This is the technology used for Data Domain deduplication, taking into account that an added layer of compression is applied after new/unique chunks are identified.

2014 EMC Proven Professional Knowledge Sharing 6

Figure 2: Sub-file, variable length chunks deduplication

How much Data Storage could be gained? How could it be maximized? While deduplication efficiency varies according to different factors, 20x disk space reductions is typical for plain uncompressed file systems data. The main factors that affect deduplication efficiency include:  Data type or nature; some types of data are much more compressible (text files, spreadsheets, etc.) versus other types that are already compressed in nature (Audio/Video files, graphics) and thus recompressing them will not produce a significant benefit. As it relies on file segmentation and file-chunks identification, any change applied on those incoming files (such as compression and/or encryption) will produce new patterns of chunks—even with minor changes on those files—and thus reduce the gain from the deduplication operation.  Change rate: Storage savings increase with each subsequent backup of the save set because a deduplication backup writes to disk only those data blocks unique to its catalogue; thus, data that have a high change rate will produce lower gain than data that has a lower change rate.

2014 EMC Proven Professional Knowledge Sharing 7

 Data Retention: The amount of time data is intended to be kept available for recovery affects the size of the data catalogue (imagine that there is a database of hashes that represent every stored chunk). As such, if you retain the data for longer period, your catalogue is larger and thus your deduplication efficiency would increase (as there will be higher probability to find similar chunks). For more information, check page 27 of “NetWorker and Data Domain Devices Integration Guide”

What is the penalty of this gain? While not really a penalty, as with any compression algorithm, uncompressing (rehydrating) the data consumes time and effort. However, the DD appliance is engineered to make data rehydration as painless as possible. With DD Boost devices, concurrent sessions per device may extend up to 60 sessions per DD Boost device (multiple recoveries will not impair each other, nor will any application supporting parallel recovery); each session can easily reach 50 MB/s in a good network infrastructure supporting Gigabit Ethernet. Backup performance becomes very high following the first full backup (how high depends on the change rate), but recovery performance will be comparable to the performance of the first full backup because the data will be rebuilt in its plain format, then sent to the recovering host.

Classic design example When configuring an environment for Backup to Disk, there are many alternatives in the choice of the type of target media (local disk, SAN connected, or even NAS attached).

One could settle for the simplest way and export a NAS (Network Attached Storage) file system (CIFS or NFS as per your client platform preference) to enable a backup to disk target file system that could be mounted on a backup Server, any of its Storage Nodes (SNs), or even a Dedicated SN (a client application host that is used as a SN only for its own data; this aims to optimize LAN access by setting the Application host as a "dedicated" Storage Node (DSN). Thus, the data goes from the application host to the backup storage directly instead of having to pass by some generic storage node. This is an optimization configuration that avoids having the backup data flow (client to SN and then SN to NAS appliance) traverse the LAN twice. It is needed when the network access is a bottleneck.

2014 EMC Proven Professional Knowledge Sharing 8

A Data Domain host can be configured to export a NAS filesystem. Once your target disk is ready, configure your disk device. In NetWorker AFTD is the type most used.

Even without backup software, this NAS file system can still be used as a target for Oracle RMAN backup script or MSSQL backup script, and Data Domain host will deduplicate the resulting backup files. Meanwhile, DD Boost demonstrates its value-add will by discovering that backup performance is limited by the network bandwidth.

To tackle network congestion at the target Data Domain host, the Data Domain appliance is configurable for NIC aggregation for Network connectivity optimization. Network connectivity optimization through link aggregation on Data Domain host side will certainly help. Still, even with link aggregation deployed without any problems, there are physical limitations to any LAN that it cannot bypass.

Different aggregation protocols and hashing methods exist in the Data Domain configuration option. It is important to mention that Link aggregation is a point-to-point mechanism, not end- to-end. In other words, it aggregates the switch ports to Data Domain NICs into a single virtual interface, but the clients are not aware of this mechanism. Details are available in the Data Domain OS administration guide.

If you do not favor backup to disk for any reason, i.e. it saturates your LAN links or causes LAN infrastructure constraint, you can use your hosts SAN connectivity to connect to the DD virtual tape library (VTL). This allows data to travel on the SAN through Fiber Channel (FC) connectivity without any data transport overhead on the LAN, which is still used in this case but just for meta-data transport to the backup server.

What if the above options are not enough? What if we have a tighter backup window and need more optimizations? DD Boost is the answer. As the size of data targeting the Data Domain host—sent over the wire as plain data—scales up, duties increase for your LAN and pressures rise on your backup system especially with more backup-to-disk clients and storage nodes added in your data center.

In such situations, client-side deduplication or, in Data Domain parlance, Distributed Segment Processing is your solution, as it enables identification of file chunks to take place on the deduplication client side (the host that sends data to the Data Domain appliance). Thus, there is

2014 EMC Proven Professional Knowledge Sharing 9 no need to send all plain data on the network; only new chunks need to be sent to the Data Domain host.

In other words, DD Boost with its DSP feature working through Data Domain API ensures that the host that is sending data to the Data Domain appliance is not sending redundant data over the network. Any DD Boost-enabled application will compute the hashes of the chunks of data that it wants to send to the Data Domain host for storage, then ask the Data Domain host : do you already have those fingerprints (as an identifier for each file chunk) in your catalogue? If so, the Data Domain host does not need to receive redundant data; it will just create the pointer. If not (this is a new data chunk) it is compressed, then sent to the Data Domain host for storage.

Figure 3 In this way, data redundancy checking becomes a mutual effort between the deduplication client (host sending data using DD Boost functionality) and the Data Domain host appliance, which optimizes the network usage in exchange for a minimal CPU and memory penalty on the client side.

Projecting the above concept on to NetWorker operations, we can see that all that is needed is to transform NetWorker Backup to disk devices to be DD Boost-enabled. Thus, we do not have to worry about which NAS protocol to use for network file access (DD Boost handles that part through its native NFS). Even device directory creation is managed through DD Boost as NetWorker can do that through talking directly to Data Domain OS through the DD Boost API. Details on DD Boost device creation are found in the “NetWorker and Data Domain Devices Integration Guide”. Also, migration from Old Tape devices / Backup to Disk devices to DD Boost-optimized devices is discussed in Chapter 3 of the same document.

2014 EMC Proven Professional Knowledge Sharing 10

The bandwidth and time gain is quite astonishing, as NetWorker adds more usages of the DD Boost API “client direct” configuration option (added in NetWorker 8.0) enabling backup clients to send their data “directly” to the Data Domain host instead of having to pass by their configured Storage Node. This optimizes network usage and accelerates backup execution as now the clients are not sending any plain data to their SNs on the wire. Though the SN is still used for meta-data processing, it is not stressed with the data storage efforts which increases the likelihood for backup success.

Figure 4

This is not the only gain from buying DD Boost, but this is how we choose to introduce an example on its utility. Two great gains arise from the fact that the backup application can talk to Data Domain OS and see the deduplication catalogue:

1. Clone Controlled Replication 2. Virtual Synthetic Full

How does DD Boost enhance cloning? Does that include cloning to all type of media? Cloning is copying a saveset from one storage media to another. A common example is cloning savesets from disk devices to tape devices for long term retention. Thus, the cloning operation

2014 EMC Proven Professional Knowledge Sharing 11 includes reading the saveset (similar to recovering it) then writing it back to another media (similar to backup).

The scope of DD Boost cloning is not related to storage media other than Data Domain, which means if you are cloning your savesets between storage media that includes anything other than Data Domain hosts, you will be running conventional cloning (recover the saveset from media A + write it to media B). However, if you are cloning your saveset between two Data Domain hosts, this is your chance to leverage DD Boost Clone Controlled Replication (CCR).

Figure 5

How does it work? When both source and target storage pools are Data Domain devices, DD Boost saves backup system bandwidth, i.e. CPU, memory, and network bandwidth, through the Managed File Replication (MFR) feature. How this happens is an interesting story; as the cloning operation reads the saveset that it wants to clone from DD Boost device A on DD host A (like recovering), it should then write it to DD Boost device B on Data Domain host B as this saveset is stored in the form of a file (or more). Why not tell Data Domain host A to replicate that file to Data Domain host B? This would save the effort of reading (which is a rehydration operation) the whole file and writing it back (which is a dehydration operation) to a different host. Also, bandwidth that will be utilized to read the plain data saveset can also be preserved, because Data Domain

2014 EMC Proven Professional Knowledge Sharing 12 replication copies only the chunks missing (to construct that file) from source to destination. This makes CCR a great candidate for cloning to the Disaster Recovery (DR) site, satisfying legislative requirements to archive your backups offsite for financial auditing, corporate internal auditing requirements, DR planning, and contingencies without the need to clone to tape.

Figure 6

What advantage does CCR have over the conventional Data Domain replication? Quite a few. Data Domain conventional replication has three limitations:

1. The backup server will not be aware that cloning took place, so manual intervention will be needed to create and mount the required device, if recovery from DR is needed. 2. You should not use conventional cloning with DD Boost devices (the replicated devices cannot be used as a source for further replication). For more information, see Data Domain native replication considerations of “NetWorker and Data Domain Devices Integration Guide” 3. There is no way to force different retention on cloned savesets as the backup server is not aware of replication. Consequently, this cannot be used for long term archiving.

2014 EMC Proven Professional Knowledge Sharing 13

Any other cloning enhancements? It is important to mention that NetWorker 8.1 added a new enhancement to a cloning operation called “immediate cloning”. This enables a saveset to be cloned as soon as its backup is done, as opposed to group cloning that runs a clone process to clone all savesets backed up during the group run and scheduled cloning that runs clone process in a scheduled manner aside from the backup run.

For more information on how to configure and run CCR clones, refer to the “NetWorker and Data Domain Devices Integration Guide”

What is the Virtual Synthetic Full feature added in NetWorker version 8.1? How does it leverage DD Boost for further optimizations of backup operation? First, let’s define what a Synthetic Full (SF) backup is: suppose that you need to do a full back up before rolling out a critical system patch or cumulative update but you don’t have enough time on your backup window for a full backup. The solution is either to cut the time needed from production time (typically not an option) or postpone the critical update. This is when SF comes to the rescue.

SF runs an incremental backup, then uses that incremental and earlier incrementals till the last full backup to construct a new full backup without actually running a full backup. Hence, the name, Synthetic Full. Introduced in NetWorker version 8, SF is not supported for NDMP backups. For a list of SF requirements consult KB article 168411: https://support.emc.com/kb/169411. Also, more details can be found in the NetWorker Administration guide.

Figure 7

2014 EMC Proven Professional Knowledge Sharing 14

Virtual Synthetic Full (VSF) backup is a new feature introduced in NetWorker 8.1(it requires DD Boost 2.6 and DD OS 5.3 or higher). It is the same as a synthetic full backup, except that it is performed on a single Data Domain system (all full and incremental backups must reside on the same Data Domain host). Similar to Synthetic Full, VSF uses full and partial backups to create a new full backup. However, since the backups reside on a Data Domain system, and use the new DD Boost APIs, the backup does not require saveset data to be sent over the wire (no need to read the savesets), resulting in improved performance over synthetic full and traditional backups.

What actually happens is that since NetWorker is constructing a synthetic full from savesets stored on the Data Domain host, and since DD Boost allows NetWorker to see the file-chunks catalogue, it does not have to read the savesets off the Data Domain host. Instead, it may use that catalogue to construct the new SF (or VFS, in this case) without having to read all the savesets off the Data Domain host and write the new saveset. For more details on VSF backup execution, refer to the NetWorker Administration guide – page 88.

Does DD Boost and client direct work for module backups as well as for file system backups? Yes. Consult your module documentation to confirm that your version has the proper support. Of course, your client must have direct network access to the Data Domain host.

How many DD Boost devices can I configure on a SN? You can configure as many as you need. Keep in mind that a single device can accommodate 60 sessions so there should be no problem sharing the same device through multiple SNs as long as backups directed to this device are targeting the same pool.

Advantages/disadvantages of the new DD Boost over Fibre Channel (DFC) The pre-requisites are DD OS 5.3 or above, coupled with NetWorker 8.1 or above (the version that has DD Boost 2.6). DFC-enabled clients and SNs must be zoned to the Data Domain host HBAs target LUNs that represent the DD Boost devices. For complete deployment procedure, refer to the DD OS Administration guide and NetWorker 8.1 “NetWorker and Data Domain Devices Integration Guide”.

DFC is another way to work around an overloaded LAN during the backup window. DFC enables DD Boost-enabled clients and SNs to access DD Boost devices through SAN, minimizing bandwidth pressures over LAN to the Data Domain host during the busy backup

2014 EMC Proven Professional Knowledge Sharing 15 window. SAN connected clients and SNs send the data to the DD Boost devices through SAN partially relieving the LAN for other tasks. Client Direct configuration still applies with DFC devices; Clients are configurable for the connectivity type required (either IP of FC)

The only disadvantage is a minor performance penalty. When sending data over SAN, performance has been found lower than LAN DDBoost connectivity by 20% in the worst cases. It is worth mentioning that the client HBA settings (specifically the HBA queue length) have an impact on performance as mentioned in this DD article https://my.datadomain.com/download/kb/all/boostfc_client_qdepth.htm

DFC is currently available for Windows and hosts only, but further platform support is on the way.

Part II This is the solution proposal that we developed answering the backup requirements of an anonymous customer, there are some redundancies some figures and concepts shown below, but we decided to represent the document in its full length to preserve its integrity.

Journey to an optimized backup environment EMC is really pride on helping, designing and implementing enterprise-class backup and recovery solutions, based on powerful, sustainable, and world class level products for its customers. EMC invests in key infrastructure-related initiatives delivering upon a strategic long- term vision. With this vision in mind, we recently implemented a consolidation project for a large customer to leverage their current infrastructure and consolidate their data center with EMC solutions.

The design put forth aimed to solve these business challenges:

 Cost Competitiveness  Highest Levels of Reliability  Ease of Management  High Performance  Compatibility

2014 EMC Proven Professional Knowledge Sharing 16

Cost Competitiveness Being cost competitive is paramount in order to build and maintain business. Whether facing economic turmoil or economic boom times, we must ensure that the solutions we offer fit our customer‘s budgets.

Highest Levels of Reliability Offering cost-effective solutions is meaningless if the solution is plagued by outages. Backup infrastructures must be capable of performing even after suffering multiple component failures. Customer loyalty will be lost if we fail to meet our availability obligations. We strive to offer products and services that remain operational 24 x Forever.

An added benefit of highly reliable systems is the cost savings realized the longer the systems remain operational. Systems capable of remaining in production 7 or more years can yield significant long-term savings and/or profit over those capable of running production workloads for 3 to 5 years.

Ease of Management Hand-in-hand with being cost-effective and reliable, systems need to be as automated and easy to use as possible—being able to do more with less. The more complex the solution, the more resources it takes to maintain and operate over its lifecycle driving overall cost up while driving reliability down. Ensuring staffing levels remain stable in the face of unabated growth is essential in cost containment and is the main reason ease of management remains a key requirement.

High Performance The overall solution must be capable of delivering during periods of high usage and must be designed to eliminate congestion points. Delivering solutions that suffer from poor performance frustrates customers and wastes precious time and resources tracking down and resolving performance-related issues.

Compatibility Gone are the days of implementing independent computing silos. It‘s expensive and difficult to maintain solutions designed in isolation. To meet aggressive growth objectives, we need to ensure all of the systems being deployed are compatible and work with one another. Everything needs to work together and scale in order to keep the overall solution as simple and manageable as possible.

2014 EMC Proven Professional Knowledge Sharing 17

The Journey The solution was designed to meet and exceed our customer’s expectations. We emphasized leveraging their current infrastructure, and enhanced the ability for future upgrades depending on scalable solutions and powerful products that can support all aspects of the business.

Where we were The customer has a complex environment; the data center has many topologies for performing backups.

Current infrastructure An analysis of the infrastructure uncovered points of possible improvement and the importance of having a single backup tool for improving management and reducing administration time.

Existing infrastructure  4 backup servers (Data Zones) running with 3 different backup products o HP Data Protector - main backup infrastructure o Dell NetVault - NDMP backup DMZ and Fernord o Symantec Backexec - Trenord site o Symantec Backupexec - Infrastructure Iseo site  Repository Backup Fujitsu CS800 S2 (backup to disk storage)

Analysis based on data collected from EMC staff for ABC customer.

2014 EMC Proven Professional Knowledge Sharing 18

Legend of the table (which is written in Italian)

 Ambiente = Environment  Ambito aziendale= Name of the company  Tipo di backup = type of the backup  Mezzo trasmisivo attuale = method of transmission.  TB giorno = TB of every day, Mese = Month, and Anno = Year.

Points of improvement of the future infrastructure  Shorten the backup of the Exchange infrastructure via LAN-free backup mode.  Shorten the backup of NetApp storage by increasing the number of drives used for backup.  Shorten the backup of Oracle database through the use of LAN-free backup mode.  Possible reduction of the backup window SAP infrastructure by increasing the number of drives used by the server.  Availability of a single backup tool: single point of management and unique management methodology. NetWorker employed.  Increase performance and disk space on the VTL for a project to longer-term, adequate to support the performance improvements provided above. “Data Domain as a candidate”.

2014 EMC Proven Professional Knowledge Sharing 19

ABC backup Architecture 1

AGENTI TSM EE LAN DMZ LAN ISEO VE DMZ SAP LAN FERN LAN TREN DB VMWARE

MAIL 38 VM Bare- Metal Networker LAN vStore API Proxy FREE FAS 2040 Server 8.1 VMware Networker vStore API Proxy Server 8.1 VMware TREN SAP DWH SAP DWH FERN Bare- Metal Oracle Cluster Bare- Metal SAP DWH Svil 10 Gbe

SAP DWH Oracle Svil Windows Lupin SAN TREN SAN FERN FAS 3140

FAS 3140 5 SAP Servers Transazionali AIX

Data Domain City 1 With DD boost

Figure 8

2014 EMC Proven Professional Knowledge Sharing 20

ABC Backup architecture 2 AGENTI TSM EE LAN DMZ VE LAN ISE SAP DB VMWARE 38 VM MAIL Win Server Win Server LAN FREE

Networker Networker Server8 .1 Server 8.1 5 SAP Svil. Exchange Fern 10Gbe

Win Server VM TREN 5 SAP Exchange Exchange Tren Test Nor SQL Server

Oracle Server SAP DWH Svil VM FERNORD Win Server Linux Server SQL Server Linux Server LAN LAN FERN TREN City 2

Figure 9

2014 EMC Proven Professional Knowledge Sharing 21

AGENTI ABC Backup architecture 3 TSM EE VE SAP LAN City 2 DB MAIL

LAN FREE Win Server Networker Serve8r .1

FAS2030

SAN 8x FC Novate 10Gbe

DDBoost Replica IP bidirectional 1 Gbe

City 3 Figure 10

2014 EMC Proven Professional Knowledge Sharing 22

Customer’s challenges vs. solutions The customer had many challenges in his environment, including:

 Load on the LAN  Inefficient backups  Low level integration with virtual environment  Inability to perform tech refresh on fiber network.  Distributed management system for the backup environment

Steps to the solution NetWorker The first phase was implementing NetWorker as management software to centralize the customer’s backup, recovery, and archiving environment. The integration features of NetWorker enabled us to integrate it with the major components of the backup & recovery environment (databases and applications servers).

New features we were able to use after implementing NetWorker 8.1 as a central platform for backup and recovery included:

 greater backup efficiencies, spanning integration for EMC Array snapshot management, to further integrations with Data Domain, and new support for Block-Based Backup for Windows systems.  optimized support for VMware backup and recovery with a new underlying VMware Backup  enhanced NetWorker management on several fronts continues to expand support for enterprise applications with support for new features that maximize efficiencies

Snapshot management The customer wished to simplify management of the snapshot and also remove the overhead components by integrating the solutions together. We used the Integrated snapshot management feature which enabled us to eliminate the need to have a separate proxy server to move the snapshots. The administrator now has the ability to use the NetWorker Storage Node to act as a proxy in the workflow.

2014 EMC Proven Professional Knowledge Sharing 23

Use of snapshots as part of an overall data protection strategy not only enables fast operational backup and recovery, but also allows backup to disk or tape to happen offline without impact to the mission critical application server. This process is often referred to as “Live Backup”. Tapes can be created and sent offsite for disaster recovery purposes. At any time, recovery can be accomplished from a snapshot or from disk or tape as needed.

NetWorker Snapshot Management will catalog all snapshot activities, enabling quick search and recovery for restore purposes. NetWorker software provides lifecycle policies for snapshot save sets. Snapshot policies specify the following:

• Time interval between snapshots • Maximum number of snapshots retained, above which the older snapshots are recycled • Which snapshots will be backed up to traditional storage • Selecting the type of snap that will be created • Expiration policy of the snapshot • Number of active snapshots that will be retained on the storage array

Snapshots for DB2, Oracle, and SAP are also managed via the NetWorker Snapshot Management feature. Configuration Wizard support for these applications will be added in a later release.

NetWorker Snapshot Management operations for each NetWorker client can be monitored through NMC reporting features. Monitored operations cover snapshots that are successfully created or in progress, as well as snapshots that are mounted, in the process of being rolled over, and deleted. Reports include details of licensed capacities consumed. NMC also provides a detailed log of snapshot operations.

Snapshot Management is included with a NetWorker capacity-based license.

The Client Configuration Wizard for the NetWorker Snapshot Management feature enables automatic discovery of the environment that has been configured for snapshots by the Storage Administrator. The Wizard accommodates the common NetWorker Snapshot Management workflows associated with snapshot and rollover configurations.

2014 EMC Proven Professional Knowledge Sharing 24

Snapshot validation will verify whether a backup as configured by the Wizard is likely to be successful.

Simplify the process No scripting is required. The Configuration Wizard will ensure that the proper commands are executed for the associated snapshot operations, that the LUNs are paired appropriately, and that all NetWorker resources are properly assigned. Basically the Wizard will take care of configuring, end-to-end, the client snapshot/rollover policy.

Data Domain DD Boost inside Thanks to its ease of management and the deeper integration with NetWorker, Data Domain enabled us to eliminate the need to tape out via the the new Data Domain Boost over Fibre Channel feature.

Support for the Fibre Channel protocol has now been added to DD Boost and NetWorker 8.1 leverages it for customers who have standardized on Fibre Channel as their backup protocol of choice. This support not only optimizes the customers’ existing investment in their Fibre

2014 EMC Proven Professional Knowledge Sharing 25

Channel infrastructure, but with DD Boost client-side deduplication, the customer can now enjoy 50% faster backups over their traditional VTL-based model and 2.5x faster recovery.

Data Domain systems reduce the bandwidth required on the network, as well as the disk capacity required. Since this support offers both client-side deduplication and support of the Fibre Channel protocol using a backup-to-disk workflow, the old VTL ‘tape-based’ management can be eliminated. This results in greater reliability and less complexity. This support also enables for Fibre Channel all the features that Data Domain and DD Boost offer, including virtual synthetic full backups, clone controlled replication, global deduplication, and more.

DD Boost over Fibre Channel is supported for Windows and Linux environments.

When performing full backups previously, all data had to be sent from the backup server to the Data Domain system. With DD Boost, only unique data is sent from the backup server or the client to the DD system. This means up to 99% less data to be moved across the already loaded network, even for full backup.

This enabled us to use the current infrastructure LAN/SAN resources more efficently. Actually, when DD Boost can be leveraged at the client level (EMC NetWorker, Avamar®, and Oracle RMAN), this bandwidth advantage spans the entire backup path all the way from the client to the Data Domain system.

2014 EMC Proven Professional Knowledge Sharing 26

Figure 11

For our backup enviroment, we were experiencing bandwidth choking during full backup. This provided significant performance improvements and helped avoid infrastructure upgrades.

Figure 12 Reduce the workload on the backup servers We were restricted on adding new componenets to the current environment and the solution we proposed needed to use some componenets of the backup servers already in use.

Though we thought that moving some of the deduplicaiton work from the Data Domain system to the backup server would negativly impact the backup server, the good news was that was not the case. This might seem counterintuitive but, as it turns out, sending less data significantily

2014 EMC Proven Professional Knowledge Sharing 27 reduces the load on the server. In other words, it takes fewer CPU cycles to assist with two steps of the deduplication process than it takes to push full backups over the ethernet.

Virtual Synthetics

Figure 13 Virtual Synthetic Full backups are an out-of-the-box integration with NetWorker, making it ‘self- aware.’ Therefore, our customer is now using a Data Domain System as their backup target. NetWorker will use Virtual Synthetic Full backups as the backup workflow by default when a synthetic full backup is scheduled, thus optimizing incremental backups for file systems.

Virtual synthetics reduce the processing overhead associated with traditional synthetic full backups by using metadata on the Data Domain system to synthesize a full backup without moving data across the network. Unlike other vendors, no Storage Node/Media server is required, and there is no rehydration during the recovery.

In this workflow, a full backup is sent to Data Domain, taking full advantage of Data Domain value-add features, namely DD Boost. Incremental backups are run daily, as usual, after which point, instead of initiating a new full backup, another incremental backup would be run, and then a Virtual Full.

In a Virtual Synthetic Full backup, NetWorker sends commands to the Data Domain System of what regions are required to create a full backup, but no data is transferred over the network. Instead, the regions of the full backup are synthesized from the previous full and incrementals already on the system by using pointers. This process eliminates the data that needs to be

2014 EMC Proven Professional Knowledge Sharing 28 gathered from the file server, reducing system overhead, time to complete the process, and required network bandwidth.

This workflow is repeated over the following weeks, with a new traditional full backup recommended only after every 8-10 Virtual Full backups have been completed. Therefore, the use of Virtual Synthetic Full backups also reduces the number of traditional full backups from 52 to 6 per year – up to 90% reduction in full backups annually.

Avamar As we have illustrated in the initial diagram of the customer’s environment, customer needed to add two remote sites to support the business; thus also needing to back up these sites. We thought about the best solutions that can support this new structure without requiring major changes in the network design.

We suggested integrating the new features available in the powerful capabilities of DD Boost under the umbrella of NetWorker.

Here is the design we proposed to support all the data center activities.

Figure 14

Our proposal offers the maximum benefits of the marriage between Avamar and Data Domain, where Avamar clients send the data directly to the Data Domain system. Specifically, this integration will provide the Data Domain system scalability and performance advantages for the most challenging backup workloads including VMware image backups, NDMP, FS, and

2014 EMC Proven Professional Knowledge Sharing 29 enterprise applications, such as Oracle, DB2, MS Exchange, MS SQL databases, and MS Sharepoint. This greatly optimizes LAN bandwidth and multiplies the advantage of distributing some of the deduplication effort to hundreds of clients, improving the performance of the overall back cycle.

Additionally, DD Boost supports Avamar instant access to the virtual machines stored on the Data Domain system, which is of great benefit during system restores.

Backing up Oracle and SAP databases As referenced at the beginning of the article, the customer has Oracle DB and SAP applications. While we were adding the design we remained sensitive to what needs to be taken into consideration about these applications. We asked the DBAs about their preferences to back up their data and found that they preferred to perform a full back up every day and sometimes more than one full back up each day as per the criticality of the database.

The challenge here is; can the system and the current infrastructure support such workload? In normal conditions the backup window can take more than 12 hours for a full back up, plus the data growth over time.

In testing the difference that DD Boost can provide to reduce this issue, we found that the backup window for full backups can be reduced to 8 hours. Plus, that it can provide DBAs the ability to administrate everything through the RMAN and eliminate the need to rely upon the backup administrators. Additionally, this enables DBAs to have a full RMAN catalog of both the local and DR sites.

2014 EMC Proven Professional Knowledge Sharing 30

Figure 15

“Virtualized Environments” VMware

Optimizing VMware Backup and Recovery Integrating with the industry-leading Avamar technology for backup of VMware environments is a major feature of EMC NetWorker. VMware has chosen Avamar technology to power its recently announced vSphere Data Protection (VDP) and vSphere Data Protection-Advanced (VDP-A) support. Now, that same technology has been leveraged in NetWorker, thus enabling Change Block Tracking technology for both backup and recovery of data, as well as a multi- streaming centralized proxy that will also load balance jobs between proxy servers for increased VM backup performance, and many other features. Since the backup includes all the changed blocks, every backup is essentially always a full backup.

NetWorker uses a software-based VMware Backup Appliance (VBA). The VBA stores the metadata, sending changed blocks during the backup workflow to a Data Domain System target. This support is specific to, and optimized by, Data Domain. Therefore, the customer enjoys all the features and value from a Data Domain solution including DD Boost support, clone to tape for retention and compliance, and global deduplication, to name a few. Each VBA is capable of protecting hundreds of virtual machines ensuring protection for the largest virtual environments.

2014 EMC Proven Professional Knowledge Sharing 31

Now our customer has the option to clone from Data Domain to tape or other external media for extended retention and compliance purposes.

In-guest protection is enabled by NetWorker Modules for application consistency. The new VMware engine is supported to co-exist with support in NetWorker 8.0 and earlier, primarily for customers who continue to have a requirement to back up directly to tape using physical proxies.

Managing VMware backup and recovery Through direct integration with VMware vCenter, we offered a collaborative approach to backup management that empowers the VMware Administrator to manage their own backups, while the Backup (NetWorker) Administrator maintains visibility and control of corporate SLAs through policy-setting, monitoring, and reporting.

Both VMware and Backup Administrators are empowered with visibility and control of the environment. Protection is based on policies, as defined by the Backup Administrator, and selected for each virtual machine, or group of virtual machines, by the VMware Administrator. Virtual machines are auto-discovered and automatically protected based on the policies assigned to the group where they are created.

Both image and file level recovery are supported. Since this feature support is enabled by integration with VMware vCenter, management is virtual-centric, with information on the VMware environment presented as VM groups and folders. File level recovery is supported for both Windows and Linux.

2014 EMC Proven Professional Knowledge Sharing 32

Enhanced Management and Enterprise Applications Support

Figure 16

The last thing we offered to our customer is the new NetWorker plugin to have a panoramic view of the backup environment. With no additional cost we introduced the new EMC Backup and Recovery Manager. It is a new intuitive management interface for monitoring and reporting for NetWorker and Avamar through a single pane of glass. While primarily used for NetWorker and Avamar, it will also support monitoring of Data Domain Systems from the backup administrator’s perspective. Operators and administrators can monitor alerts, activities, and systems. It also monitors events, which are informational messages, useful for troubleshooting and auditing. Reporting features enable customers to confirm that client systems are being properly protected and also track system usage and capacity. It offers a dashboard approach providing all key information on a single screen, including alerts and warnings. Other key usability features include filters, grouping, search, and color-coded tracking for system capacity.

New core NetWorker features focused on management simplicity and usability including an integrated, wizard-based recovery graphical user interface available directly from the NetWorker Management Console. This GUI will walk the Administrator through every step of the recovery process, including recovery of snapshots, file systems, and the new Block Based Backups. It enables a recovery operation to be scheduled and can also perform multiple recovery operations at once.

2014 EMC Proven Professional Knowledge Sharing 33

The NetWorker server DR process has been simplified and replaces the current manual multi- step process in the event a NetWorker server goes down. Features include self-awareness such that if the bootstrap server ID is unknown, the system will initiate a scanner process. Now, the Backup Administrator is stepped through the process of recovery, without having to pull out a complicated manual to follow.

The command line wizard program automates the recovery of the NetWorker server’s media database, resource files, and client file indexes. The administrator can choose to recover just the media database, the resource files, the client file indexes – or all of the above.

Appendix 1- http://nsrd.moab.be/2013/07/12/networker-8-1-countdown-2/ 2- Why EMC DD series doc number h11755 3- V to the MAX by John Bowling- knowledge sharing article 2012

2014 EMC Proven Professional Knowledge Sharing 34

Biography Mohamed Sohail Mohamed has over 9 years of IT experience in operations, implementation, and support; 4 of them with EMC. Mohamed previously worked as a Technical Support Engineer at Oracle Egypt and was a Technical Trainer at Microsoft. Mohamed holds a Bsc. in Computer Science from Rome University Sapienza Italy and a B.A in Italian from Ain Shams University Egypt.

Mohamed holds EMC Proven Professional Backup Recovery certification.

Shareef Bassiouny A Backup Recovery NetWorker Specialist, Shareef is a Technical Support engineer in the GTS Organization at EMC. Shareef has over 12 years of experience in IT operations, implementation, and support; more than 3 of those spent with EMC NetWorker support. Shareef holds a Bsc. in Telecommunication Engineering from Cairo University. His previous role was leading a Dedicated IT Customer Support Desk that handled Data Center Operation and Change Management at Orange Business Services.

Giovanni Gobbo With 20 years of experience in the IT field ranging from Microsoft, Linux, , VMware, Storage, and Backup environments, Giovanni has solid hands-on experience implementing, managing, and planning physical and virtualized computer infrastructure.

Giovanni has worked for Atlantica, Terasystem, Getronics, and Olivetti.

EMC believes the information in this publication is accurate as of its publication date. The information is subject to change without notice.

THE INFORMATION IN THIS PUBLICATION IS PROVIDED “AS IS.” EMC CORPORATION MAKES NO RESPRESENTATIONS OR WARRANTIES OF ANY KIND WITH RESPECT TO THE INFORMATION IN THIS PUBLICATION, AND SPECIFICALLY DISCLAIMS IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

Use, copying, and distribution of any EMC software described in this publication requires an applicable software license.

2014 EMC Proven Professional Knowledge Sharing 35