Are You Ready for the ‘AVA’ Generation?

Randeep Singh

EMC Proven Professional Knowledge Sharing 2010

Randeep Singh Solution Architect HCL Comnet Table of Contents

Figures………………………………………………………………………………………………….3 Summary……..…………………………………………………………………………………………4 Abstract……….…………………………………………………………………………………………4 Rise of De-Dupe Concept……………………………………………………………………………..5 Terms and Concept in Avamar………………………………………………………………………..7 Introduction of RAIN Technology……………………………………………………………………..9 Active/Passive Replication…………………………………………………………………………….10 Protection of Data in VMware Environment………………………………………………………….13 Tuning Client Cache to optimize Performance…………………………………………….17 Integration with Traditional Backup Civilization………………………………………………………21 Protection of Remote Offices…………………………………………………………………………..23 Architecting Avamar to Ensure Maximum System Availability……………………………………..26 Pro-active Steps to Manage Health and Capacity…………………………………………………..27 Daily Server Maintenance……………………………………………………………………………...29 Tape Integration…………………………………………………………………………………………30 What’s New………………………………………………………………………………………………32 Why EMC Avamar………………………………………………………………………………………33

Disclaimer: The views, processes, or methodologies published in this article are those of the author. They do not necessarily reflect EMC Corporation’s views, processes, or methodologies.

2010 EMC Proven Professional Knowledge Sharing 2

LIST OF FIGURES

Source-based Data De-Duplication ………………………………………………………………...5 Global De-Dupe Concept …………………………………………………………………………….6 Traditional versus Daily Full ……………………………………………………………….6 Avamar Client Plug-ins………………………………………………………………………………..8 RAIN Architecture……………………………………………………………………………………...9 Active/Passive Replication……………………………………………………………………………11 Avamar Server Deployed as a Virtual Appliance…………………………………………………..14 Backup Solutions for VMware………………………………………………………………………..15 Processes Used to Filter Out Previously Backed Up Files………………………………………..18 Effectiveness of File Cache for Backing up File Servers versus Databases…………………....18 Avamar and Networker Integration…………………………………………………………………...22 Avamar and Networker Integration (Networker Console)……………………………………….....22 Protection of Remote Offices……………………………………………………………………...... 24 Avamar Data Transport Components………………………………………………………………...31

2010 EMC Proven Professional Knowledge Sharing 3

SUMMARY Once when I was having a cup of coffee with my students I realized every creature on this earth whether it’s a man, women, or child produces data. In fact, by 2010 most database sizes crossed into double digits in terms of petabyte. But data growth is not our most significant challenge; the main challenge arises when this data needs to be backed up with “on-demand” 24X7 accessibility. And, of course, how can we forget about the IT investment.

My students ask “Is there any way that one can minimize the backup window?” To this, I reply “well, now it’s time to step into a new AVA generation.” Traditional backup solutions store data repeatedly and total storage under management gets expanded by 5-10 times. Consider the case of government regulations; the risk of shipping tapes is one of the greatest security concerns in today’s IT infrastructure. Also these traditional backup solutions require a rotational schedule of full and incremental backups. Due to unnecessary data movement enterprises are often faced with backup windows that roll into production hours, network constraints, and too much storage to manage. Tape drives, media, and libraries are prone to mechanical failure and transport of tapes can be unreliable. While disk storage has been used to meet these challenges, it provides a fraction of the data protection challenges faced by the organizations. Consider the case of remote offices where the risk of data loss or exposure is extremely high. The answer to the above challenges is Avamar®.

ABSTRACT EMC Avamar enables fast and reliable backup and recovery across the enterprise which can include LAN/SAN servers in the data center, VMware environments, remote offices, etc. In order to achieve this, Avamar utilizes patented global data deduplication technology to identify redundant sub-file data segments at the source, before it is transferred across the network and stored to disk. Only new, unique sub-file variable length data segments are moved, thereby reducing daily network bandwidth and storage by up to 500x. Moreover for security, we can encrypt data both at rest as well as in flight. For remote offices, Avamar uses centralized management which protects hundreds of remote offices quite easily and efficiently. Total back- end storage is also reduced by nearly 50x as Avamar stores only a single instance of each sub- file data segment. It also utilizes patented RAIN technology to provide fault tolerance across nodes and eliminate single points of failure. To achieve scalability, Avamar has scalable grid architecture. It can also work with existing traditional tape solutions such as NetWorker®.

2010 EMC Proven Professional Knowledge Sharing 4 RISE OF DE-DUPE CONCEPT Looking at the data of enterprises today, we find that data is highly redundant with lots of identical files and data stored within and across systems. Performing backup using traditional backup software will store this redundant data again and again. This is where the concept of deduplication that Avamar uses comes into play, i.e. global, source data deduplication that eliminates redundancy at both the file and sub-file data segment level. The challenge of redundancy is solved in backup at the source before it is transferred over the network, whether LAN or WAN. To ensure that each unique data segment is backed up only once across the enterprise, Avamar agents are deployed on the systems to be protected that identify and filter repeated data segments stored across the enterprise over time. As a result, only a small amount of incremental backup data is generated, whether it is or database data.

EMC Avamar uses an intelligent variable length method for determining segment size that looks at the data itself to determine logical boundary points. This is opposite to the fixed block or fixed length segment method that traditional backups use, where even a small change in a data set can change all fixed length segments in a data set. The algorithm used in Avamar analyzes the binary structure of a data set, i.e. 0s and 1s, which make up a data set to determine segment boundaries, using which Avamar client agents will be able to identify the exact same segments for any data set. A variable length segment size averages about 24 KB and then compressed to an average of 12 KB. For this 24 KB segment, a unique 20-byte ID using SHA-1 encryption algorithm is generated by Avamar. This unique ID determines whether or not a data segment has been previously stored.

FIGURE 1 SOURCE-BASED DATA DEDUPLICATION

2010 EMC Proven Professional Knowledge Sharing 5 Figure 1 shows the deduplication concept. Figure 2 shows how it is actually occurring, i.e. deduplication is occurring before data is transferred across the network. This happens by identifying the sub-file variable length data segments at the source, breaking data into atoms and then, using deduplication, sending each atom only once, thereby reducing the data to be backed up by up to 500 times.

FIGURE 2 GLOBAL DE-DUPE CONCEPT Assuming there is 5 TB of data to be backed up, the data reduction factor will be 100-150x if there will be a mixture of file system and database data at source and data to be backed up at source will remain only 35-50 GB. For Windows and UNIX, file data is reduced by nearly 300x and data to be backed up will remain approximately 20 GB at source. Not only this, it also results in 85 % reduction in total client CPU utilization.

Figure 3 shows the comparison of traditional backup versus Avamar daily full backups.

FIGURE 3 TRADITIONAL VS AVAMAR DAILY FULL BACKUPS

2010 EMC Proven Professional Knowledge Sharing 6

Below are some of the factors that impact data deduplication ratios: Type of data • Duplication in user-generated data is greater than from natural sources • Encrypted and compressed data are not ideal candidates for deduplication • More user-created content = higher deduplication ratio Data change rate • Small data change rates = more duplicate data in subsequent backups • Less change = higher deduplication ratio Retention policy • Longer retention increases chances data will be repeatedly backed up • Longer retention policy = higher deduplication ratio Ratio of full backups to incremental backups • More full backups increase the amount of data being repeatedly backed up • More full backups = higher deduplication ratio

TERMS AND CONCEPTS OF AVAMAR Let us begin by learning some terms: Avamar Servers – Logical grouping of one or more nodes used to store and manage client backups. Node – Self-contained, rack-mountable network-addressable computer that runs Avamar server software on the Linux operating system. It is the primary building block in any Avamar server.

Nodes can be any of the following types: Utility Node – Dedicated to scheduling and managing background Avamar server jobs. In scalable multi-node Avamar servers, a single utility node provides essential internal services for the server which can be Management Console Server (MCS), cronjob, External authentication, Network Time Protocol (NTP), and Web access. Storage Nodes – store the actual backup data. Multiple storage nodes are configured with multi-node Avamar servers based upon performance and capacity requirements. Storage nodes can be added to an Avamar server over time to expand performance with no downtime required. Avamar clients connect directly with Avamar storage nodes; client connections and data are load balanced across storage nodes.

2010 EMC Proven Professional Knowledge Sharing 7 NDMP Accelerator node – specialized node that utilizes NDMP in order to provide data protection for NAS filers (for example, EMC Celerra®, Network Appliance) and Novell NetWare servers. Access Nodes – used to run the Avamar File system Single Node Servers – combines all the features and functions of utility and storage nodes in a single node. Hard Disk Storage – Disks used for storage. Stripes – Unit of disk drive space managed by Avamar to ensure fault tolerance. Object – Single instance of deduplicated data. Each Avamar object inherently has a unique ID. Objects are stored and managed within stripes on the Avamar server. Avamar Administrator – graphical management console software application that is used to remotely administer an Avamar system from a supported Windows client computer. Avamar Enterprise Manager – Web-based management interface that provides the ability to monitor and manage a distributed Avamar deployment. Avamar Clients – filter out redundant data before sending data over networks, making it possible to protect systems even over congested LANs or WANs.

FIGURE 4 AVAMAR CLIENT PLUG-INS Agents – Platform-specific software processes running on the client that communicates with the management console server (MCS) and with any plugins installed on that client. There are two types of plugins supported. File system Plugins – used to browse, back up, and restore files or directories on a specific client file system which can be HP-UX, IBM AIX, Linux, MAC OS X, Microsoft Windows (2000, 2003, 2008, XP, and Vista), Sun Solaris, Novell Netware, and VMware. Application Plugins – support backup and restore of databases or other special application. Supported application plugins include; DB2, NDMP for NAS filers, Microsoft Exchange, Microsoft SQL Server, and Oracle databases.

2010 EMC Proven Professional Knowledge Sharing 8 Avamar Replicator – enables efficient, encrypted, and asynchronous replication of data stored in an Avamar server to another Avamar server deployed in remote locations without the need to ship tapes.

INTRODUCTION OF RAIN TECHNOLOGY Assume that your traditional backup solution fails. What will you do? Avamar uses a new concept of RAIN (Redundant Array of Independent Nodes) technology that provides failover and fault tolerance across the nodes in an Avamar server grid. If a server node fails or becomes unavailable, then data stored on any node can be reconstructed from any other node, thus providing reliable data protection and access. In addition to providing failsafe redundancy, RAIN can be used when rebalancing the capacity across the nodes once we have expanded the Avamar server by adding nodes. Through this, we will be able to manage the capacity of the system as the amount of data added to the system continues to grow. Figure 5 shows the RAIN architecture.

FIGURE 5 RAIN ARCHITECTURE

Operational best practices for using RAIN technology in Avamar • Always enable RAIN for any configuration other than single-node servers and 1x2’s (two active data nodes). Minimum RAIN configuration is a 1x4 (four active data nodes plus a utility node and spare node). Double-disk failures on a node or a complete RAID controller failure can occur, which can corrupt the data on a node. Without RAIN, the

2010 EMC Proven Professional Knowledge Sharing 9 only recourse is to reinitialize the entire system and replicate the data from the replication target. • When deploying non-RAIN servers, we must replicate the data on these servers to ensure the data is protected. Non-RAIN servers have no data redundancy and so, any loss of data can have significant and widespread impacts to the backups stored on that Avamar server. • When deploying a RAIN configuration, always include an active spare node in the module. The best way to minimize system downtime is to have a spare node readily available. • For performance reasons, Avamar recommends limiting the number of nodes to 16 active data nodes (1x16 configuration). Limit starting configurations to 12 to 14 active data nodes so that nodes can be added later if needed to recover from high capacity utilization situations.

ACTIVE/PASSIVE REPLICATION PHASE Replication transfers data from a source Avamar server to a destination Avamar server. The data replicated to the destination server can be directly restored back to primary storage without having to be staged through the source Avamar server. Replication is accomplished by way of highly efficient, asynchronous Internet Protocol (IP) data transfers, which can be scheduled during off-peak hours to optimize use of network bandwidth. Replication also uses data deduplication technology that finds and eliminates redundant sequences of data before it is sent to the destination server, thereby reducing network traffic and promoting efficient use of hard disk storage.

Using replication, an enterprise can centrally protect and manage multiple remote branch offices that are using individual single-node servers for local backup and restore. The centralized multi- node server can then be used for disaster recovery in the event of catastrophic data loss at any remote branch office. Replication can also be used to replicate data stored in a multi-node server to any other multi-node server in an enterprise, enabling multi-node servers to provide peer-to-peer disaster recovery for each other.

If the primary server and the replication target are configured in an active-passive configuration, then with one change to the DNS entry and, with a few commands, all clients can continue daily backups to the replication target as shown in Figure 6. In this example of failover, clients still back up to host name BACKUP_SRVR, but in the DNS server, BACKUP_SRVR is associated with IP address 192.168.200.1, the utility node of REPL_DST.

2010 EMC Proven Professional Knowledge Sharing 10

Replication is useful for more than recovering from the traditional disaster associated with a site failure. Replication is the most reliable form of redundancy that the system can offer because it creates a logical copy of the data from the replication source to the destination. It does not create a physical copy of the blocks of data, and therefore, any corruptions, whether due to hardware or software, are far less likely to be propagated from one Avamar server to another. Also during replication, multiple checks of the data occur to ensure that only uncorrupted data is replicated to the replication target.

Figure 6 ACTIVE/PASSIVE REPLICATION

There are two kinds of Replication: • Normal Replication • Full root-to-root replication. In Normal replication, user data on the source Avamar server is replicated to a destination Avamar server. A special “REPLICATE” domain on the destination server is created during the first replication operation. This domain contains a mirrored representation of the entire source

2010 EMC Proven Professional Knowledge Sharing 11 server client tree on the destination server. All data within the REPLICATE domain is read-only. The only operations allowed on these backups are: • Redirected restores to other clients not within the REPLICATE domain. • Changing a backup expiration date. • Validating backups on other clients not within the REPLICATE domain. • Viewing backup statistics. • Deleting a backup. Full “root-to-root” replication creates a complete logical copy of an entire source server on the destination server. Replicated data is not copied to the REPLICATE domain; it is added directly to the root domain just as if source clients had registered with the destination server and source server data replicated is fully modifiable on the destination server.

To fully replicate an Avamar server, all of the following data must be copied from the source server to the destination server during each replication operation: • Client backups • Domains, clients, and users • Groups, datasets, schedules, and retention policies • State of the server-like contents of the activity monitor and server monitor databases at the time of the last MCS backup or flush.

LIMITATIONS of REPLICATION: • Only static data is replicated, i.e. data that is quiescent on source server gets replicated. Thus, any operation that writes data to the source server and is not completed yet or running will not be replicated to the destination server during current replication operation. No doubt, it will be replicated during the next replication operation. • Avamar Administrator can only manage one server at a time. In an environment with more than one server, Avamar Enterprise Manager multi-system management console is used.

BEST PRACTICES FOR REPLICATION: • Protect the data on the Avamar server by replicating the data to another Avamar server. • Set up active-passive replication pairs to take advantage of failover capability, i.e. one Avamar server (active) is dedicated for taking backup of clients and the other Avamar server (passive) is the dedicated replication target for backup source server. If the active

2010 EMC Proven Professional Knowledge Sharing 12 server fails, then by changing just one DNS entry, all clients will begin backing up to the passive server as described in the previous image. • Ensure that available bandwidth is sufficient to replicate all of the daily changed data within a 4-hour window so that the system can accommodate peaks of up to 8 hours per day. If using daily replication, avoid using --include flag option as every time a new client is added to the Avamar system, client data is not replicated until editing repl_cron file to add --include option for new client. In other words, every time we add a new client to the active Avamar server, the client data is not replicated unless we edit the repl_cron.cfg file to add a new --include option for that client. Also, a system that specifies the -- include option might not include the MC_BACKUPS data. That means none of the policies are then replicated. • For best results, ensure that the target server is running either the same or later version of Avamar software as the source Avamar server. • Use a large time-out setting for initial replication. Recent backups might not get replicated as replication processes can time out before all backups are successfully replicated. This is because replication always replicates data backups alphabetically by client name and earliest backups before later backups. • Normal source server maintenance tasks such as hfs check and garbage collection should not be running during the replication session. • Schedule back ups during period of low backup activity as this will ensure that the greatest number of client backups will be replicated during each replication session.

PROTECTION OF DATA IN VMWARE ENVIRONMENT There are many benefits of VMware view to IT organizations, but as the amount of data stored on virtual machines increases, traditional backup finds it difficult to back up such environments. EMC Avamar delivers fast and efficient protection for such environments by deduplicating data at the source, thereby sending only unique, sub-file variable length data segments during daily full backups. Traditional backup solutions will move approximately 200% of primary data on a weekly basis. In comparison, Avamar, utilizing deduplication, moves approximately 2%. Results are: • Up to 95% reduction in data moved • Up to 90% reduction in backup times • Up to 50% reduction in disk impact • Up to 95% reduction in NIC usage • Up to 80% reduction in CPU usage • Up to 50% reduction in memory usage

2010 EMC Proven Professional Knowledge Sharing 13

EMC Avamar Virtual Edition for VMware is the industry’s first deduplication virtual appliance for backup, recovery, and disaster recovery. Avamar Virtual Edition enables users to deploy Avamar’s deduplication technology easily, effectively, and in a repeatable fashion on VMware ESX Server hosts. Each virtual appliance supports up to 2 TB of deduplicated backup capacity compared to a traditional backup schedule that would require approximately 70 TB of tape or disk storage and can leverage the existing VMware shared server and storage infrastructure to lower costs and simplify IT management. Avamar Virtual Edition supports VMotion for deployment flexibility, and up to two Avamar Virtual Edition virtual appliances per ESX server to provide scalability. Replication between Avamar virtual appliances or from Avamar virtual appliances to physical Avamar servers eliminates reliance on offsite tape shipments and the risk of losing unencrypted data. Figure 7 shows the Avamar Server software deployed as a virtual appliance.

FIGURE 7 Avamar Server Deployed As a Virtual Appliance Key components that need to be protected are: • Microsoft Active Directory that provides authentication and user configuration to the VMware view solution. • View Manager that interacts with vCenter to manage virtual desktop instances. • Virtual desktop templates. • vCenter, the centralized management application for the virtual infrastructure. • ESX Server that provides the virtualization of the desktop instances. • User data disk is the location of unique user data that is mapped to the virtual desktop instance upon user login and where users store their individual data.

2010 EMC Proven Professional Knowledge Sharing 14 There can be two data protection strategies for a VMware environment.

The first strategy uses a combination of hardware and software to make a simultaneous copy of the LUNs where the virtual machine desktops are stored, as well as the view manager application and configuration information. It will allow a total recovery of the entire environment. A requirement for using this strategy involves making a duplicate copy of the VMware view environment for presentation to the backup solution.

The second strategy involves protecting the key components of VMware view infrastructure independently using Avamar client software agents. This will avoid the infrastructure cost we would have incurred if using the first strategy.

Figure 8 shows the Avamar client backup solutions for VMware.

Figure 8 Backup Solutions for VMware

In the first solution, i.e. at Guest level, the Avamar agent resides inside each virtual machine, deduplicates data within the virtual machines as if they are physical servers, moving minimum data to thus reduce resource contention, accelerate backup, and provide file level restore for Windows, Linux, and Solaris.

Advantages and considerations for this method: Advantages • Highest level of data deduplication, resulting in maximum storage efficiency • Guest-level backup easily fits into most existing backup schemes; day-to-day backup procedures do not change • Support for fast partial restores of individual directory (folder) or files • Optional support for application-level support for DB2, Exchange, Oracle, and SQL Server databases • No advanced scripting or VMware knowledge required

2010 EMC Proven Professional Knowledge Sharing 15 Considerations The only significant consideration to this approach is that although file and directory restores are a simple one-step process, full system recovery is a two-step procedure in which you first load a known-good operating system image inside the virtual machine, then restore the unique data from the guest-level backups stored on the Avamar server.

In the second solution, i.e. VCB, the Avamar agent resides on the proxy (ESX) server, deduplicates within and across the VMDK files, and supports both VCB file (Windows only) and image level backup. Avamar replication provides disaster recovery for backed up VMDKs files.

Advantages and considerations for this method: Advantages • Backup resources are offloaded to the VCB proxy server • Full-image backups of running virtual machines • File-level backups for Windows virtual machines • Image virtual disks compressed • Consolidation ratios optimized for VMware Considerations • Restores are a multi-step process • VCB does not support iSCSI or NAS/NFS • VCB can only back up a virtual machine with a disk image stored on a device that the proxy can access • VCB cannot back up virtual disks that are RDM in physical compatibility mode • VCB can only back up a virtual machine that has an IP address or domain name server (DNS) name associated with it • VCB can only perform a file-level backup of a virtual machine running Microsoft Windows NT 4.0, Windows 2000, Windows XP, Windows XP Professional, or Windows 2003 • All backups and restores must be initiated using the Avamar Administrator Graphical management console; it is not possible to initiate backups and restores directly from the virtual machine interface

In the third solution, i.e. at the Console level, the Avamar agent resides on the ESX console operating system, deduplicates within and across the VMDK files, and provides VMDK-image level (no file level) backup without VCB. No proxy server or SAN server is required for this configuration.

2010 EMC Proven Professional Knowledge Sharing 16 Advantages and considerations for this method: Advantages • Virtual machine image-level backups without VCB • No proxy or SAN required Considerations • No file-level backup • Two-step restore • Advanced configuration (without proper scheduling and setup, backup jobs can impact system resources and running virtual machines) • If we perform a backup by suspending or powering down a virtual machine, and then back it up directly, the virtual machine must be powered down or suspended before backup begins. Virtual disk images are not compressed (empty space is processed).

TUNING CLIENT CACHE TO OPTIMIZE BACKUP PERFORMANCE Whenever backup begins, the avtar process loads two cache files into memory from the var directory that is in the Avamar installation path. The first of the cache files is the file cache (f_cache.dat). It stores a 20-byte SHA-1 hash of the file attributes, and is used to quickly identify which files have previously been backed up to the Avamar server. When backing up file servers, the file cache screens out approximately 98% of the files. When backing up databases, however, the file cache is not effective since all the files in a database appear to be modified every day.

The second cache is the hash cache (p_cache.dat). It stores the hashes of the chunks and composites that have been sent to the Avamar server. The hash cache is used to quickly identify which chunks or composites have previously been backed up to the Avamar server. The hash cache is very important when backing up databases. These client caches are used to: • Reduce the amount of time required to perform a backup • Reduce the load on the Avamar client • Reduce the load on the Avamar server

Figure 9 shows the process that avtar uses to filter out previously backed up files and chunks. It shows the values that are incremented and reported in the avtar advanced statistics option. First of all, it starts the file and computes SHA-1 hash of file attributes and checks whether file hash exists in the cache. If chunk hash exists in hash cache, then it processes next chunk. If it does not, then a request is sent to the server which checks whether the chunk hash exists on the

2010 EMC Proven Professional Knowledge Sharing 17 Avamar server. If it does, it then processes the next chunk; otherwise, it will send chunk and chunk hash to the Avamar server.

FIGURE 9 Processes Used to Filter Out Previously Backed Up Files

Also consider Figure 10 that shows the effectiveness of file cache when backing up file servers in contrast to databases. In this example, we have considered 1 TB of file system and 1 TB of database.

2010 EMC Proven Professional Knowledge Sharing 18

Figure 10 Effectiveness of file cache, backing up file servers vs. databases It is clear from Figure 10 that file cache provides 98% reduction in client load, approximately 980 GB of data is screened out during backup, and only around 3 GB data is sent to the Avamar server. While in the case of databases, file cache is 0% effective in reducing client load and around 30 GB of data is sent to the Avamar server.

By default, file cache consume up to 1/8th of the physical RAM on the Avamar client. For example, if the client has 4 GB of RAM, the file cache will be limited to 4 GB/8, or 512 MB maximum. The file cache doubles in size each time it needs to grow. The current file cache sizes are 5.5 MB, 11 MB, 22 MB, 44 MB, 88 MB, 176 MB, 352 MB, 704 MB, and 1,408 MB.

The file cache includes two 20-byte SHA-1 hash entries: • the hash of the file attributes, which include file name, file path, modification time, file size, owner, group, and permissions. • the hash of the actual file content, independent of the file attributes.

File cache rule is: If the client comprises N million files, the file cache must be at least N million files x 40MB/million files. Example: If a client has four million files, the file cache must be at least 160MB (4 million files x 40MB/million files). In other words, the file cache must be allowed to grow to 176MB.

Hash cache consume up to 1/16th of the physical RAM on the Avamar client. Example: For clients with 4 GB of RAM, the hash cache will be limited to 4 GB/16, or 256MB maximum. The hash cache also doubles in size each time it needs to grow. The current hash cache sizes are 24MB, 48MB, 96MB, 192MB, 384MB, 768MB, and so forth.

2010 EMC Proven Professional Knowledge Sharing 19 So, in this example where a client has 4 GB of RAM, the maximum size of the hash cache will be 192MB. The hash cache include only one SHA-1 hash per chunk, which is the hash of the contents of the chunk.

Hash cache rule is: If the client comprises Y GB of database data, the hash cache must be at least Y GB/average chunk size x 20MB/million chunks. Use 16kB as the average chunk size for streaming Exchange and Oracle database backups. Use 24kB as the average chunk size for all other backups, including backups of database dump files.

Example: If an Exchange database client has 250GB of database data, the hash cache must accommodate 16 million chunks (250GB/16kB). At 20MB per million chunks, this means that the hash cache must be at least 320MB.

The amount of memory consumed by the avtar process is generally in the range of 20 to 30 MB. This amount depends on which operating system the client is running, and also fluctuates during the backup depending on the structure of the files that are being backed up by avtar. Since the file cache and hash cache can grow to maximum sizes of 1/8th and 1/16th of the total RAM in the system, respectively, you can see that, for a client that has, for example, more than 1/2 GB of RAM, the file and hash caches contribute more to the overall memory use than the rest of the avtar process. This is because both caches are read completely into memory at the start of the avtar backup. Also, by default, the overall memory that client caches use is limited to approximately 3/16th of the physical RAM on the Avamar client.

Changing Client Cache Size We can override the default limits on the size of the file and hash caches by using the following two flags: --filecachemax=VALUE --hashcachemax=VALUE Where VALUE is an amount in MB or a fraction (negative value = fraction of RAM). As described previously, the default values are: --filecachemax=-8 --hashcachemax=-16

For example, the following option will limit the file cache to 100 MB. Because the file cache doubles in size every time it needs to grow, the file cache will actually grow to 88 MB if the following option is used: --filecachemax=100

2010 EMC Proven Professional Knowledge Sharing 20 To optimize performance, sometimes we need to increase the cache sizes from the default values. These situations arise if there are either millions of small files or large databases.

Millions of Small Files If the client has millions of small files, then we might need to increase the file cache from the default size. The general rule is the client requires 512 MB of physical RAM for every one million files on the Avamar client. If a client has one million files, a minimum of 40MB is required just to store all the file hashes for a single backup, since there are two 20- byte entries per file. Since the file hashes must be stored for several backups, more than this is required. As discussed previously, the file cache doubles in size each time it needs to grow. The current file cache sizes are: 5.5 MB, 11MB, 22MB, 44MB, 88MB, 176MB, 352MB, 704MB, and 1,408MB. This means growth to about 44MB occurs. By default, since 1/8th of the physical RAM of 512MB is used, cache can grow to a limit of 64MB, which means that the default value of 1/8th of RAM for the file cache is adequate.

Large Databases If the client has a few large files, the default of 1/16th for the hash cache is usually insufficient. For example, for a 240GB database, a cache of up to 10 million hashes is required. Since each hash is 20 bytes, a hash cache that is at least 200MB is required, and that can only store the hashes for a single backup. The hash cache also doubles in size each time it needs to grow. The current hash cache sizes are: 24MB, 48MB, 96MB, 192MB, 384MB, 768MB, and 1,536MB. The next increment available is 384MB. Therefore, if this client has 4 GB of RAM, the hash cache must grow to 1/8th of the RAM. If the default of 1/16th of the RAM is used, then the hash cache will be limited to 192MB, resulting in an undersized hash cache. In the case of databases, since very few files are backed up, the file cache will be considerably smaller, so the net memory use is still about 1/8th to 3/16th of the RAM.

Operational Best Practice: Never allow the total combined cache sizes to exceed 1/4 of the total available physical RAM.

INTEGRATION WITH TRADITIONAL BACKUP CIVILIZATION A growing number of organizations are considering the benefits of data deduplication to reduce the cost of disk-based backup and recovery. A recent ESG Research Survey indicates that 11% of respondents are currently using data deduplication and 32% plan on doing so. Until recently, the only way to introduce data deduplication into an existing backup and recovery environment was to swap tape or disk-based recovery hardware for an appliance that supports data deduplication. While these hardware-based solutions can deduplicate backup data stored on disk, they do nothing to reduce the amount of data flowing over the network and can create islands of deduplication. ESG Lab has confirmed that EMC NetWorker can be easily deployed, leveraging Avamar deduplication technology to create a global pool of deduplicated data that

2010 EMC Proven Professional Knowledge Sharing 21 drastically reduces disk and network requirements while preserving the familiar look and feel of NetWorker.

The latest release of Avamar has been integrated with the NetWorker backup product. As a result, NetWorker users can now take advantage of Avamar data deduplication using familiar NetWorker interfaces, workflows, and backup policies. Also, it will provide centralized control of traditional and next-generation backup.

NetWorker clients can be configured to take advantage of Avamar data deduplication to dramatically decrease the amount of time, network bandwidth, and disk capacity required to perform backup jobs. This is accomplished by installing a version 7.4 or later NetWorker agent with built-in Avamar support on an application server. The agent performs source-side global data deduplication before forwarding backup data to a NetWorker server and an Avamar Data Store. Figure 11 shows a pictorial representation of how this is happening.

Figure 11 Avamar and Networker Integration Configuring NetWorker to deduplicate data can be performed in three simple steps: • Select Avamar as a NetWorker managed application • Create a deduplication node • Discover Avamar deduplication node Select the deduplication backup checkbox as shown in Figure 12. .

2010 EMC Proven Professional Knowledge Sharing 22

Figure 12 Avamar and Networker Integration (Networker Console)

ESG Lab used NetWorker with Avamar-enabled deduplication to back up a Windows virtual server. Here are the results of the test. • Backup took 5 minutes 31 seconds to back up 15 GB of data using an existing NetWorker backup policy. • Restore of a 10 MB log file took two seconds. • NetWorker deduplication backup summary report yielded that the latest full backup of the Windows virtual server protected 15,261 MB of data, yet only 127 MB of data was transferred.

PROTECTION OF REMOTE OFFICES Information critical to the success and efficiency of an organization is not just found in corporate data centers; it also resides at remote and branch offices. Remote and branch offices (ROBOs) contain a large amount of enterprise data, yet most still rely upon traditional tape-based backup methods manned by non-IT staff. Tape backup, snapshots, mirroring, staging, and archiving all significantly increase the amount of storage under management. Non-IT staff handling backups and tapes increases the cost of data protection and the risk of data loss. In a distributed environment, it is impractical to centralize backup operations with traditional backup architectures and existing network capacity. Moving even daily incremental backup data sets requires so much network bandwidth and time that even this simple process can become prohibitively expensive and inefficient.

2010 EMC Proven Professional Knowledge Sharing 23 Avamar addresses this issue by improving backup processes in several ways. Replicating data over a WAN instead of shipping tapes can be cost-prohibitive because of the significant amount of data that needs to be transferred across the network. Avamar employs capacity reduction technology and global data deduplication across sites and servers with a lightweight agent running on the client systems it protects.

Avamar requires a one-time-only full backup where redundant data is eliminated and after that, only new, unique sub-file data segments are backed up daily across the WAN to the data center or DR site. The software identifies and filters repeated variable length data segments stored in files within a single system and across multiple systems over time, capturing only net new segments. In addition, algorithms are applied to eliminate space and redundant file patterns, further shrinking data volumes.

Avamar source-based global data deduplication significantly reduces the amount of data sent over the WAN, enabling fast, daily full backups for remote office data. Backups are centrally managed from a corporate data center using the intuitive Avamar Management console and data can be encrypted for security.

Figure 13 illustrates how remote and branch offices using Avamar can reduce the amount of data being transferred during backup (labeled ―Bandwidth Efficient ) by sending only unique blocks of data to a centralized data center. At smaller remote offices, only Avamar software agents are deployed on the systems, while larger remote offices and data centers typically deploy a local Avamar server to improve recovery performance.

Figure 13 Protection of Remote Offices

2010 EMC Proven Professional Knowledge Sharing 24 Avamar also supports disaster recovery and offsite archival by replicating backup data over a wide area network to a remote data center. Data deduplication and replication capabilities ensure that only changes since the last backup, comprised of unique and compressed sub-file variable length data segments, are replicated. This is extremely valuable, reducing the cost of network bandwidth and improving the performance of backups and replication over the WAN.

There are 2 ways to back up clients in remote offices: • Back up remote office clients to a small Avamar server located in the remote office (remote Avamar backup server) and replicate data to a large centralized Avamar server (centralized replication destination). • Back up those clients directly to a large centralized Avamar server (centralized Avamar backup server) and replicate data to another large centralized Avamar server (centralized replication destination). Which method is best depends upon the following factors: • RTO (Recovery Time Objective): The first method is more reliable because restore can be done directly from that server across the local area network to the client. • Server Administration: The amount of administration and support required is roughly proportional to the number of Avamar servers deployed in an environment. Accordingly, option 2 is more reliable. The tradeoff then becomes RTO versus the additional cost of deploying, managing, and supporting multiple Avamar server instances.

According to the operational best practice, unless we cannot meet our restore time objectives, architect the system so that clients back up directly to a large active centralized Avamar server and then replicate data to another large, passive centralized Avamar server.

Best practices recommended for taking backup of remote clients: • Copy the data to a portable storage device (such as a USB hard drive) • Install the Avamar software on a client local to the Avamar server • Attach the portable storage device to the local client • Perform a one-time backup of all the data from the portable storage device • Start to back up the remote client This procedure ensures that when the initial remote client backup takes place, it will only need to transfer data which has changed or has been added since the seeding process occurred.

2010 EMC Proven Professional Knowledge Sharing 25 ARCHITECTING AVAMAR TO ENSURE MAXIMUM SYSTEM AVAILABILITY In this section, we will consider the factors involved in architecting Avamar to ensure maximum system availability. These factors include RAIN, RAID, replication, checkpoints, and backup of remote offices. Of these factors, we have already discussed RAIN, replication, and how to protect remote offices to ensure maximum system availability. We will discuss RAID and checkpoints in this section.

RAID An individual drive has a certain life expectancy before it fails, as measured by MTBF (Mean time Before Failure). Since there are potentially hundreds or even thousands of drives in a , the probability of a drive failure increases significantly. As an example, if the MTBF of a drive is 750,000 hours, and there are 100 drives in the array, then the MTBF of the array becomes 750,000/100, or 7,500 hours. RAID (Redundant Array of Independent Disks) was introduced to mitigate this problem.

RAID combines two or more disk drives in an array into a RAID set or a RAID group. The RAID set appears to the host as a single disk drive. Properly implemented RAID sets provide: • Higher data availability • Improved I/O performance • Streamlined management of storage devices All standard Avamar server node configurations use RAID to protect the system from disk failures. The Avamar server nodes are typically configured with SCSI hard drives in a RAID 5 configuration or SATA hard drives in a RAID 1 configuration.

Failed drives impact I/O performance and affect Avamar server performance and reliability. Further, RAID rebuilds can significantly reduce the I/O performance, adversely impacting performance and reliability of the Avamar server.

Operational best practices for implementing RAID: • Protect the data on the Avamar server by using RAID to protect against individual hard drive failures. • Set the RAID rebuild to a relatively low priority. • Always set up log scanning to monitor and report hardware issues; email home feature can be set for this. • For hardware purchased separately, the customer must regularly monitor and address hardware issues promptly.

2010 EMC Proven Professional Knowledge Sharing 26

CHECKPOINTS A checkpoint is a snapshot of the Avamar server taken for the express purpose of facilitating server rollbacks. If there are backups running, the server will suspend the backups, make the system read-only, take the checkpoint, make the system read-write, and then resume the backups. Checkpoints are taken to assist with disaster recovery and provide redundancy across time. These are the consistent snapshots of the entire Avamar system that can be verified for integrity. If an integrity check fails due to an inconsistency, and the inconsistency cannot be fixed, the system can be quickly rolled back to a prior checkpoint. Checkpoints allow recovering from operational issues and certain kinds of corruption by rolling back to the last validated checkpoint. Although checkpoints are an effective way to revert the system to an earlier point in time, checkpoints are like all other forms of redundancy and therefore, require disk space. The more checkpoints we need to retain, the larger the checkpoint overhead. Operational best practice is to leave the checkpoint retention policy at the default values, typically set to retain the last two checkpoints and the last validated checkpoint.

PROACTIVE STEPS TO MANAGE HEALTH AND CAPACITY Capacity is one of the most important aspects of administering an Avamar server. There are two main components that define Avamar Server capacities: • Storage Subsystem (GSAN) Capacity • Operating System Capacity GSAN capacity is the total amount of commonality factored data and RAIN parity data (net after garbage collect) on each data partition of the server node. This is measured and reported by the GSAN process.

Operating System Capacity is the total amount of data in each data partition, as measured by the operating system, which not only includes the size of all the stripes in the current working directory, but also the checkpoint overhead and the amount of data stored on the data partition by other activities.

Now we will describe how Avamar server will behave as it crosses various consumed storage thresholds. 100% — The “Server Read-Only Limit.” When server utilization reaches 100% of total storage capacity, it automatically becomes read-only. This is done to protect the integrity of the data already stored on the server. 95% — The “Health Check Limit.” This is the amount of storage capacity that can be consumed and still have a “healthy” server.

2010 EMC Proven Professional Knowledge Sharing 27

Although an Avamar server could be allowed to consume 100% of available storage capacity, it is not a good practice to actually let that occur. Consuming all available storage can prevent certain server maintenance activities from running, which might otherwise free additional storage capacity for backups. For this reason, a second limit is established called the “health check limit.” This health check limit is derived by subtracting some percentage of server storage capacity from the server read-only limit. The default health check limit is 95%. We can customize this setting according to a specific site but setting this limit higher than 95% is not recommended. When server utilization reaches the health check limit, existing backups are allowed to complete, but all new backup activity is suspended. Notifications are also sent in the form of a pop-up alert each time we log into Avamar Administrator and a system event will require acknowledgment before any future backup activity can resume. 80% — Capacity Warning Issued. When server utilization reaches 80%, a pop-up notification will inform that the server has consumed 80% of its available storage capacity. Avamar Enterprise Manager capacity state icons are yellow.

Icons that communicate capacity levels

The most significant daily operational impact of capacity is to backup performance and hfscheck. Operational best practices recommendations: • Understand how to monitor and manage the storage capacity of the Avamar server on a daily basis. • Limit storage capacity usage to 80% of the available "server utilization" capacity.

Proactive steps to manage capacity: When the GSAN capacity exceeds 80% of the final Read Only threshold, perform the following operational best practices: • Stop adding new clients to the system. • Adjust retention policies to decrease the retention period of backups if possible and thus, reduce capacity use.

2010 EMC Proven Professional Knowledge Sharing 28 • Check if backups are preventing a garbage collect operation from starting. If so, the following error message is logged in the /usr/local/avamar/var/cron/gc.log file MSG_ERR_BACKUPSINPROGRESS or garbage collection skipped because backup is in progress. • If possible, decrease the GSAN capacity by deleting or expiring backups and running garbage collection. Note that deleting backups does not free space until garbage collect has run several times. Garbage collect finds and deletes the unique data associated with these backups. • Replicate the data to another server temporarily, and then replicate the data back after reinitializing the server. Because replication creates a logical copy of the data, this compacts all the data onto fewer stripes. • For multi-node Avamar servers utilizing RAIN, add nodes and rebalance the capacity.

DAILY SERVER MAINTENANCE: Daily Avamar server maintenance comprises three essential activities: • Checkpoint (discussed above) • Garbage collection • HFS check

Garbage Collection Garbage collection is the process of recovering unused space from backups that have expired. The garbage collection process requires a quiet system in order to run — garbage collection will not initiate if any backups are running.

HFS check A Hash File system (HFS) check is an internal operation that validates the integrity of a specific checkpoint. Once a checkpoint has passed a HFS check, it can be considered reliable enough to be used for a system rollback. In order for a HFS check to successfully start, there must be two or fewer currently running backups per storage node.

Avamar servers define two daily “maintenance windows,” during which some or all of the essential server maintenance activities are performed. The default start times for these maintenance windows are: • 6:00 A.M. (morning) • 6:00 P.M. (evening)

2010 EMC Proven Professional Knowledge Sharing 29 The following table lists the essential server maintenance activities, i.e. checkpoint, HFS check, and garbage collection, and summarizes each activity’s impact on normal system operation. CHECKPOINT GARBAGE HFS CHECK COLLECTION Runs in morning? Yes Yes Yes Runs in evening? No No No Typical run time 5 minutes - 2 hours max Several hours 1 hour Starts if backups Yes No Only if two or fewer are running? backups are running on each node Effect on running Temporarily Not applicable Unaffected backups suspended Effect on new Can start Cannot start Can start backups System state Read-only Read-only Read-write during activity

TAPE INTEGRATION Avamar Tape Output option provides the ability to archive backup data stored within an Avamar server to tape via a disk cache. This option utilizes a disk staging area (for example, a media server with a disk pool or a file server) to temporarily stage Avamar backup data in un- deduplicated format, which is then retrieved by any standard third-party tape-based backup software solution to write the data to tape. The Avamar Tape Output option is configurable to output to tape either all or a subset of the backup data stored within the Avamar server, and can also be tuned based on the size of the disk staging area. A restore is performed by a restore directly from tape to the appropriate client(s). The Avamar Tape Output option provides the following advantages: • A restore can be performed at the granularity of a single file. • A restore can be performed directly from the tape software to a client without having to restore via the Avamar server. • ACL’s and alternative data streams are maintained on tape if the operating system of the disk staging area system matches the operating system of the original client that was backed up.

2010 EMC Proven Professional Knowledge Sharing 30

Avamar has introduced Avamar Data Transport, an Avamar server that runs as a virtual machine in a VMware ESX Server (3.x) environment. Avamar Data Transport is a hardware and software system that transports deduplicated data from an Avamar server to tape. Transport nodes are sized at 1.0 TB nodes. A transport node functions as a target of replication from a primary Avamar server which, after replication is complete, is moved to tape. Transport nodes can also be recovered from tape, allowing backed up data to be accessed by Administrators or users. Avamar Data Transport 1.0 supports a maximum of four transport nodes. Following are the terms that will be useful for understanding its working:

Avamar Data Transport Framework The framework is a service-oriented architecture that provides applications with common services such as communication, security, and event logging. Control Node The control node is a 64-bit Linux host where the Avamar Data Transport application and the Avamar Data Transport Framework run. The Avamar Data Transport user interface is accessed from a web server that is installed on this host. In addition, the database of all transported files is often located on the control node. There is only one control node in an Avamar Data Transport system. Tape Backup Server Avamar Data Transport supports NetWorker or Symantec Net Backup tape backup products. Both of these products have a server component that drive the tape archiving process. Transport Node Transport nodes are Avamar Virtual Edition servers that have been modified specifically for use in the Avamar Data Transport system. They are the targets of replication from Avamar servers, and are used as backup and restore target by the tape backup server.

The system consists of the following physical components: • Physical Avamar server • ESX server for hosting control and transport nodes • Tape backup server configured with NetWorker or Symantec NetBackup • Tape library, tape drives, and tapes

2010 EMC Proven Professional Knowledge Sharing 31

FIGURE 14 Avamar Data Transport Components

Before installing Avamar Data Transport, the components mentioned above must be running in your network environment. After the pre-installation checks have been made, install and configure the Avamar Data Transport components (as shown in Figure 14) on the appropriate hosts in the system: • Avamar Data Transport System Service (on the Avamar servers) • Control node (on the ESX server) • Transport nodes (on ESX server) • Transport node services (on transport nodes) • Avamar Data Transport groups or policies on the tape backup server

Installation and configuration of these components occurs in the following order, as shown in Figure 14: 1. Control node 2. Avamar Data Transport System Service on the Avamar server (or servers) 3. Transport nodes 4. Tape backup server and tape backup clients 5. Transport node Services After installing these components, Avamar Data Transport is configured to transport deduplicated data to tape. This way, the Avamar Data Transport node is used for tape out mechanism and ships the data backed up to tape.

2010 EMC Proven Professional Knowledge Sharing 32 WHAT’S NEW Avamar Desktop/Laptop introduction in Avamar 5.0 Recently EMC introduced Avamar 5.0, designed for backing up desktops and laptops. The Avamar Desktop/Laptop features are designed to enhance the functionality of the Avamar client for Windows and Mac desktops and laptops. Avamar Desktop/Laptop is based on the Avamar client software. It adds enhanced features for an enterprise’s desktop and laptop computers. These features include: • Web browser UI that provides manual backup, restore by search, restore by browse, and backup history. • UI available in ten languages. • Domain-oriented data security. Users are authenticated through your enterprise’s Active Directory or Open LDAP-compliant directory service. Users can also be authenticated using Avamar authentication. • User-initiated backups. Users can manually initiate a backup from the client. • Users can restore folders and files directly to the original location. • Creation of a restore set by search or directory-tree browse. Users can use search or can browse a backup directory-tree to create a set of folders and files to restore. • Backup history. Users can view the folders and files backed up in their last 10 successful backups. • Deployment of Avamar clients using common Systems management tools. Avamar Desktop/Laptop can be push installed on Windows and Mac desktop and laptop computers using Systems management tools, such as Microsoft Systems Management Server 2003 (SMS).

WHY AVAMAR? • Solves the challenges associated with traditional backup by using patented global, source data deduplication technology to identify redundant data segments at the source before transfer across the network. • Enables fast, reliable backup and recovery for VMware environments, remote offices, and LAN/NAS servers across existing networks and infrastructure. • Data encrypted in flight and at rest, thus adding security. • RAID protection from disk failures. • Supports SAN or internal disk storage. • Scalable grid architecture providing scalability.

2010 EMC Proven Professional Knowledge Sharing 33 • Fault tolerance across nodes, thereby eliminating single points of failure and providing high availability using RAIN technology. • Eliminates manual recovery drill checks by ensuring that checks run daily in order to ensure data integrity. • New design and features for desktop and laptop solution. • Centralized web-based management provides ease of management. • Eliminates LAN backup bandwidth bottlenecks; reduces daily network bandwidth by up to 500x using deduplication. • Provides an alternative to archaic IT processes (shipping tapes for disaster recovery), providing automated, encrypted remote copy over existing WANs, thereby mitigating risk involved with shipping tapes. • Integration with exiting backup infrastructure and software. • Simple one-step recovery. • Reduction of total back-end storage by up to 50x for cost-effective, long-term, disk- based recovery.

2010 EMC Proven Professional Knowledge Sharing 34