Backup Optimization 'Networker Inside'
Total Page:16
File Type:pdf, Size:1020Kb
BACKUP OPTIMIZATION ‘NETWORKER INSIDE’ Shareef Bassiouny Mohamed Sohail EMC EMC Giovanni Gobbo Senior IT Consultant Table of Contents Executive summary .................................................................................................................... 3 Introduction ................................................................................................................................ 4 Part 1 ......................................................................................................................................... 5 How much Data Storage could be gained? How could it be maximized? ................................ 7 What is the penalty of this gain? ............................................................................................. 8 Classic design example .......................................................................................................... 8 Advantages/disadvantages of the new DD Boost over Fibre Channel (DFC) .........................15 Part II ........................................................................................................................................16 Journey to an optimized backup environment ........................................................................16 The Journey ..........................................................................................................................18 Steps to the solution .................................................................................................................23 NetWorker .............................................................................................................................23 Data Domain .........................................................................................................................25 Avamar ..................................................................................................................................29 “Virtualized Environments” .....................................................................................................31 Appendix ...................................................................................................................................34 Biography ..................................................................................................................................35 Disclaimer: The views, processes, or methodologies published in this article are those of the authors. They do not necessarily reflect EMC Corporation’s views, processes, or methodologies. 2014 EMC Proven Professional Knowledge Sharing 2 Executive summary Do you need to speed up your back up by up to 50%? Do you need to reduce the use of your bandwidth up to 99%? Do you want to reduce the backup server workload up to 40%? Do you want to increase your backup success rate? The answer? Data Domain® Boost (DD Boost) which enables you to finish backups within backup windows and provide breathing room for data growth. With performance up to 31 TB/hr, it is 3 times faster than any other solution, enabling you to use your existing network infrastructure more efficiently. In this Knowledge Sharing article we illustrate how we optimized our backup processes and leveraged current resources by integrating NetWorker® backup management software and the new DD Boost over Fiber Channel feature to enhance backup system performance. The major component of EMC backup and recovery software solutions, NetWorker is a cornerstone element in the backup solutions of large infrastructure customers. This article targets backup administrators, support engineers, and stakeholders interested in the importance of the DD Boost over Fiber Channel feature and how to use it to enhance backup success rate. The goal of this article is to help you: speed up backups avoid congestion that slows down large critical backups through bandwidth utilization reduction minimize workloads on backup hosts (NetWorker server and Storage nodes) 2014 EMC Proven Professional Knowledge Sharing 3 Introduction In Part 1, we follow a dialogue we had with a customer while promoting Data Domain for his backup environment, which led us to promote NetWorker as one of the best integrated products with Data Domain appliances. Part 1 is a series of questions and answers that try to discover why and how, while we were trying to concentrate on the basic concepts and leave the details to the referenced documents, primarily “NetWorker and Data Domain Devices Integration Guide” version 8.1. Part 2 is the final output of the customer conversation from part 1 coupled with the data that we had from the customer requirements documents. We then produced a solution proposal that relied on the concepts we had built in Part 1, along with details on how those products fit into the customer environment. 2014 EMC Proven Professional Knowledge Sharing 4 Part 1 While deduplicated storage as a backup media target is not a new concept in Backup and Recovery Solutions architecture (BRSa), the technology used for this deduplicated storage is one of the major factors that affects backup performance and success rates. A well-known example is EMC DL3D which integrated multiple storage technologies to achieve Backup to Disk (B2D) performance through a Virtual Tape Library interface, coupled with a backend storage deduplication. However, since the deduplication process was running offline, appliance performance was known to deteriorate beyond 70-80% disk utilization. Data Domain emerged as cutting-edge technology for deduplicated storage solutions targeting backup solutions as a backup to disk storage. Its “in-line” deduplication technology (data is deduplicated before being written to disk, as soon as it reached the storage host), and high performance made it one of the best-selling products in the EMC Data Protection and Availability Delivery portfolio. Perhaps the main reason for its market appeal is the sustainable performance that it delivers (minimal performance degradation beyond 95% utilization) and the diverse storage connectivity options it provided. Further integration with backup solutions led to DD Boost, one of the most interesting features provided with Data Domain appliances. DD Boost is comprised of Distributed Segment Processing (DSP) coupled with DD API. DSP is a mechanism that enables client-side deduplication to be integrated into virtually any application that wants to dump data to a secondary storage backup media. DD API is the Data Domain programming interface that enables applications/hosts to communicate with DD Operating system (DDOS) in a way that leverages this integration interface to provide more features and facilities to “boost” performance, minimize backup widow and bandwidth utilization, and enhance backup success rates. Basic concepts mentioned in the following discussion include: Brief Blueprint on Deduplication Technologies Deduplication and compression have the same aim; to remove redundancies from the data patterns. While compression scope is file or an archive of files, deduplication scope is a File System used to store backup data, also called Storage Unit (SU) in Data Domain jargon. Here, we are not talking about file level deduplication (which hashes the contents of every file on the file system and thus detects duplicate content and removes the duplicate copies, replacing them by stub-pointers to the original content). 2014 EMC Proven Professional Knowledge Sharing 5 Figure 1: File-based deduplication We are talking about sub-file deduplication technology which segments every file using a certain segmentation algorithm—the most efficient have been found to be variable length segmentation—into chunks. It is those chunks that are identified by their hash fingerprints, so if a duplicate chunk is found it is replaced by a pointer to the original chunk (the first one found to be unique). This is the technology used for Data Domain deduplication, taking into account that an added layer of compression is applied after new/unique chunks are identified. 2014 EMC Proven Professional Knowledge Sharing 6 Figure 2: Sub-file, variable length chunks deduplication How much Data Storage could be gained? How could it be maximized? While deduplication efficiency varies according to different factors, 20x disk space reductions is typical for plain uncompressed file systems data. The main factors that affect deduplication efficiency include: Data type or nature; some types of data are much more compressible (text files, spreadsheets, etc.) versus other types that are already compressed in nature (Audio/Video files, graphics) and thus recompressing them will not produce a significant benefit. As it relies on file segmentation and file-chunks identification, any change applied on those incoming files (such as compression and/or encryption) will produce new patterns of chunks—even with minor changes on those files—and thus reduce the gain from the deduplication operation. Change rate: Storage savings increase with each subsequent backup of the save set because a deduplication backup writes to disk only those data blocks unique to its catalogue; thus, data that have a high change rate will produce lower gain than data that has a lower change rate. 2014 EMC Proven Professional Knowledge Sharing 7 Data Retention: The amount of time data is intended to be kept available for recovery affects the size of the data catalogue (imagine that there is a database of hashes that represent every stored chunk). As such, if you retain the data for longer