Are You Ready for the 'AVA' Generation?

Are You Ready for the ‘AVA’ Generation? Randeep Singh EMC Proven Professional Knowledge Sharing 2010 Randeep Singh Solution Architect HCL Comnet Table of Contents Figures………………………………………………………………………………………………….3 Summary……..…………………………………………………………………………………………4 Abstract……….…………………………………………………………………………………………4 Rise of De-Dupe Concept……………………………………………………………………………..5 Terms and Concept in Avamar………………………………………………………………………..7 Introduction of RAIN Technology……………………………………………………………………..9 Active/Passive Replication…………………………………………………………………………….10 Protection of Data in VMware Environment………………………………………………………….13 Tuning Client Cache to optimize Backup Performance…………………………………………….17 Integration with Traditional Backup Civilization………………………………………………………21 Protection of Remote Offices…………………………………………………………………………..23 Architecting Avamar to Ensure Maximum System Availability……………………………………..26 Pro-active Steps to Manage Health and Capacity…………………………………………………..27 Daily Server Maintenance……………………………………………………………………………...29 Tape Integration…………………………………………………………………………………………30 What’s New………………………………………………………………………………………………32 Why EMC Avamar………………………………………………………………………………………33 Disclaimer: The views, processes, or methodologies published in this article are those of the author. They do not necessarily reflect EMC Corporation’s views, processes, or methodologies. 2010 EMC Proven Professional Knowledge Sharing 2 LIST OF FIGURES Source-based Data De-Duplication ………………………………………………………………...5 Global De-Dupe Concept …………………………………………………………………………….6 Traditional versus Daily Full Backups……………………………………………………………….6 Avamar Client Plug-ins………………………………………………………………………………..8 RAIN Architecture……………………………………………………………………………………...9 Active/Passive Replication……………………………………………………………………………11 Avamar Server Deployed as a Virtual Appliance…………………………………………………..14 Backup Solutions for VMware………………………………………………………………………..15 Processes Used to Filter Out Previously Backed Up Files………………………………………..18 Effectiveness of File Cache for Backing up File Servers versus Databases…………………....18 Avamar and Networker Integration…………………………………………………………………...22 Avamar and Networker Integration (Networker Console)……………………………………….....22 Protection of Remote Offices……………………………………………………………………........24 Avamar Data Transport Components………………………………………………………………...31 2010 EMC Proven Professional Knowledge Sharing 3 SUMMARY Once when I was having a cup of coffee with my students I realized every creature on this earth whether it’s a man, women, or child produces data. In fact, by 2010 most database sizes crossed into double digits in terms of petabyte. But data growth is not our most significant challenge; the main challenge arises when this data needs to be backed up with “on-demand” 24X7 accessibility. And, of course, how can we forget about the IT investment. My students ask “Is there any way that one can minimize the backup window?” To this, I reply “well, now it’s time to step into a new AVA generation.” Traditional backup solutions store data repeatedly and total storage under management gets expanded by 5-10 times. Consider the case of government regulations; the risk of shipping tapes is one of the greatest security concerns in today’s IT infrastructure. Also these traditional backup solutions require a rotational schedule of full and incremental backups. Due to unnecessary data movement enterprises are often faced with backup windows that roll into production hours, network constraints, and too much storage to manage. Tape drives, media, and libraries are prone to mechanical failure and transport of tapes can be unreliable. While disk storage has been used to meet these challenges, it provides a fraction of the data protection challenges faced by the organizations. Consider the case of remote offices where the risk of data loss or exposure is extremely high. The answer to the above challenges is Avamar®. ABSTRACT EMC Avamar enables fast and reliable backup and recovery across the enterprise which can include LAN/SAN servers in the data center, VMware environments, remote offices, etc. In order to achieve this, Avamar utilizes patented global data deduplication technology to identify redundant sub-file data segments at the source, before it is transferred across the network and stored to disk. Only new, unique sub-file variable length data segments are moved, thereby reducing daily network bandwidth and storage by up to 500x. Moreover for security, we can encrypt data both at rest as well as in flight. For remote offices, Avamar uses centralized management which protects hundreds of remote offices quite easily and efficiently. Total back- end storage is also reduced by nearly 50x as Avamar stores only a single instance of each sub- file data segment. It also utilizes patented RAIN technology to provide fault tolerance across nodes and eliminate single points of failure. To achieve scalability, Avamar has scalable grid architecture. It can also work with existing traditional tape solutions such as NetWorker®. 2010 EMC Proven Professional Knowledge Sharing 4 RISE OF DE-DUPE CONCEPT Looking at the data of enterprises today, we find that data is highly redundant with lots of identical files and data stored within and across systems. Performing backup using traditional backup software will store this redundant data again and again. This is where the concept of deduplication that Avamar uses comes into play, i.e. global, source data deduplication that eliminates redundancy at both the file and sub-file data segment level. The challenge of redundancy is solved in backup at the source before it is transferred over the network, whether LAN or WAN. To ensure that each unique data segment is backed up only once across the enterprise, Avamar agents are deployed on the systems to be protected that identify and filter repeated data segments stored across the enterprise over time. As a result, only a small amount of incremental backup data is generated, whether it is file system or database data. EMC Avamar uses an intelligent variable length method for determining segment size that looks at the data itself to determine logical boundary points. This is opposite to the fixed block or fixed length segment method that traditional backups use, where even a small change in a data set can change all fixed length segments in a data set. The algorithm used in Avamar analyzes the binary structure of a data set, i.e. 0s and 1s, which make up a data set to determine segment boundaries, using which Avamar client agents will be able to identify the exact same segments for any data set. A variable length segment size averages about 24 KB and then compressed to an average of 12 KB. For this 24 KB segment, a unique 20-byte ID using SHA-1 encryption algorithm is generated by Avamar. This unique ID determines whether or not a data segment has been previously stored. FIGURE 1 SOURCE-BASED DATA DEDUPLICATION 2010 EMC Proven Professional Knowledge Sharing 5 Figure 1 shows the deduplication concept. Figure 2 shows how it is actually occurring, i.e. deduplication is occurring before data is transferred across the network. This happens by identifying the sub-file variable length data segments at the source, breaking data into atoms and then, using deduplication, sending each atom only once, thereby reducing the data to be backed up by up to 500 times. FIGURE 2 GLOBAL DE-DUPE CONCEPT Assuming there is 5 TB of data to be backed up, the data reduction factor will be 100-150x if there will be a mixture of file system and database data at source and data to be backed up at source will remain only 35-50 GB. For Windows and UNIX, file data is reduced by nearly 300x and data to be backed up will remain approximately 20 GB at source. Not only this, it also results in 85 % reduction in total client CPU utilization. Figure 3 shows the comparison of traditional backup versus Avamar daily full backups. FIGURE 3 TRADITIONAL VS AVAMAR DAILY FULL BACKUPS 2010 EMC Proven Professional Knowledge Sharing 6 Below are some of the factors that impact data deduplication ratios: Type of data • Duplication in user-generated data is greater than from natural sources • Encrypted and compressed data are not ideal candidates for deduplication • More user-created content = higher deduplication ratio Data change rate • Small data change rates = more duplicate data in subsequent backups • Less change = higher deduplication ratio Retention policy • Longer retention increases chances data will be repeatedly backed up • Longer retention policy = higher deduplication ratio Ratio of full backups to incremental backups • More full backups increase the amount of data being repeatedly backed up • More full backups = higher deduplication ratio TERMS AND CONCEPTS OF AVAMAR Let us begin by learning some terms: Avamar Servers – Logical grouping of one or more nodes used to store and manage client backups. Node – Self-contained, rack-mountable network-addressable computer that runs Avamar server software on the Linux operating system. It is the primary building block in any Avamar server. Nodes can be any of the following types: Utility Node – Dedicated to scheduling and managing background Avamar server jobs. In scalable multi-node Avamar servers, a single utility node provides essential internal services for the server which can be Management Console Server (MCS), cronjob, External authentication, Network Time Protocol (NTP), and Web access. Storage Nodes – store the actual backup data. Multiple storage nodes are configured with multi-node Avamar servers based upon performance

Load more