How to Start Your Disaster Recovery in This “Cloudy” Landscape EMC Proven Professional Knowledge Sharing 2011
Total Page:16
File Type:pdf, Size:1020Kb
how to start your disaster recovery in this “cloudy” landscape EMC Proven Professional Knowledge Sharing 2011 Roy Mikes Storage and Virtualization Architect Mondriaan Zorggroep [email protected] Table of Contents About This Document 3 Who Should Read This Document? 3 Introduction 4 1. What is a Disaster 6 2. What is a Disaster Recovery Plan (DR plan) 7 2.1. Other benefits of a Disaster Recovery Plan 7 3. Business Impact Analysis (BIA) 8 3.1. Maximum Tolerable Downtime (MTD) 9 3.2. Recovery Time Objective (RTO) 9 3.3. Recovery Point Objective (RPO) 9 4. Data Classification 10 5. Risk Assessment 13 5.1. Component Failure Impact Analysis (CFIA) 16 5.2. Identifying Critical Components 18 5.2.1. Personnel 18 5.2.2. Systems 18 5.3. Dependencies 19 5.4. Redundancy 21 6. Emergency Response Team (ERT) 23 7. Developing a Recovery Strategy 24 7.1. Types of backup 26 7.2. Virtualized Servers and Disaster Recovery 27 7.3. Other thoughts 28 8. Testing Recovery Plans 29 9. Role of virtualization 30 9.1. Role of VMware 31 9.2. Role of EMC 33 9.3. Role of VMware Site Recovery Manager (SRM) 35 10. VMware Site Recovery Manager 36 11. Standardization 41 12. Conclusion 42 References 44 EMC Proven Professional Knowledge Sharing 2 About This Article Despite our best efforts and precautions, disasters of all kind eventually strike an organization, usually unanticipated and unannounced. Natural disasters such as hurricanes, floods, or fires can threaten the very existence of an organization. Well-prepared organizations establish plans, procedures, and protocols to survive the effects that a disaster may have on continuing operations and help facilitate a speedy return to working order. Continuity and recovery planning are two separate procedures of reparation to restore and recover critical business operations in the event of such disasters. My focus in this article concerns recovery planning. This article should help you understand the need for Business Continuity Management and Disaster Recovery Planning in relation to a working failover plan. Because it is not all technical, this article covers most of the non-technical discussions in relation to Disaster Recovery Planning. After reading this document I think you can make a good start. As such, this material is probably most useful to those with little or no familiarity with this topic. Readers who fall into this category would be well served to read this document. Who Should Read This Document? This article is written for IT professionals who are responsible for defining the strategic direction of protecting data in their data center(s). These include: Storage Administrators Operational, middle level managers Business Managers IT managers (CIO, Chief information officer) Organizations and individuals who have the same interests should read this article as well. Where to start with Disaster Recovery Planning? It often remains a difficult story. My goal is to give a general guideline to provide insight into Disaster Recovery Planning, which should not be too difficult to read. EMC Proven Professional Knowledge Sharing 3 Introduction Let‘s start this with a simple quote; ―Information is the organization’s most important asset‖ Data is created by the applications and is processed to become information. Information is undoubtedly the most important asset for an organization. Does this make sense? Absolutely! The digital footprint for each person on this planet is growing. In a sense it does not matter whether we as a person or a corporation store data; it has to be protected. For some people, photos are just as important as a company's ERP system. It is not for nothing that storage vendors put in a lot of energy to manage this information. From a Disaster Recovery perspective, the world is divided into two types of businesses; those that have DR plans and those that don‘t. If a disaster strikes your organization in each category, which do you think will survive? When disaster strikes, organizations without DR plans have an extremely difficult road ahead. If the business has any highly time-sensitive critical business processes, that business is almost certain to fail. If a disaster hits an organization without a DR plan, that organization has very little chance of recovery. And it‘s certainly too late to begin planning. Organizations that do have DR plans may still have a difficult time when a disaster strikes. You may have to put in considerable effort to recover time-sensitive critical business functions. But if you have DR plan, you have a fighting chance at survival. Does your organization have a disaster recovery plan today? If not, how many critical, time- sensitive business processes does your organization have? Many organizations think they have a DR plan. They think they have some procedures and that is all it takes. True, you need procedures, but you need also to be sure that you actually can failover. How do you manage that? Personally, I think testing live will do more damage than knowing you can. I can take a guess, but actually do know for sure, the number of every organizational change. Many organizational infrastructures change per hour. Try to fit in your DR plan when changing that fast. Where does that leave you? Good question. Probably when you test your failover you do it once per year, maybe twice or even each quarter. How much do you think has changed since the last time you performed your failover. Thus, this is a considerable challenge. Lucky for you there are many techniques and solutions, such as "clouds", where DR plans are probably already well organized, or VMware Site Recovery Manager (SRM) who can help you with your failover. VMware SRM is a business continuity and disaster recovery solution that helps you plan, test, and execute a scheduled migration or emergency failover of data center services from one site to another. But the most beautiful part of SRM is, you can test a plan without doing it live. Wow!!! I can actually failover anytime without doing some damage to the infrastructure environment? True! Virtualization these days can make Disaster Recovery implementations easy. Think not only public but also private. Private clouds have a huge positive impact and synergy. How many of you are looking for partnerships or serve as each other‘s failover? That makes 1+1=3. But take it easy people. Don't press <Enter> too soon. There is a lot to consider before taking this road. Depending on the nature of your business, good disaster recovery is achieved by designing a process which enables your operations to continue to work, perhaps from a different EMC Proven Professional Knowledge Sharing 4 location, with different equipment, or from home, making full use of technology to achieve a near seamless transition that is all but invisible to your customers and suppliers. Insurance can mitigate the cost of recovery, but without a disaster recovery plan that gets you back up and running you could still go under. Indeed, more than 70% of businesses that don‘t have a DR plan fail within 2 years of suffering a disaster. So what's next? Certainly a lot! But don't make life too difficult. There will always be one or more single points of failures. You should ask yourself if the costs are worth the five nines (99,999%) availability. The primary task and next step is to determine how you will achieve your Disaster Recovery goals for each of the systems and system components to ensure that the critical, time-sensitive business processes continue working. First, this is the point at which it becomes important to consider exactly what types of disasters you need to prepare for and to classify them by the extent and type of impact they have. EMC Proven Professional Knowledge Sharing 5 1. What is a Disaster? You may argue with me about the definition of a disaster, because there is more than one definition. To some, anything that doesn't go according to their schedule or plans is a disaster. On a personal level, a fire in our house could be considered a disaster. In most cases, one broken server isn‘t a disaster but many servers are. However, it is important to understand the difference between these kinds of disasters, and a ‗true‘ disaster. This will allow you to keep things in perspective when making your own disaster plans. Should your company experience a disaster, the first 48 hours following the disaster will be the most critical in your recovery efforts. How you respond during that period will determine if your business will survive. Furthermore, the most important hour is the one immediately following the event. A disaster is defined as an event causing great loss, hardship, or suffering to many organizations. When we think of this kind of event we usually think of catastrophic events such as hurricanes, earthquakes, floods, fires, and even man-made disasters. In situations like this, help may be unavailable because rescuers may be in the same predicament as you, and it could take a considerable length of time for help to arrive. Disaster preparedness is the sensible thing to do. It doesn't need to be expensive and it can save your business! In these situations we are not talking about losing server cooling or power for a few hours; we are talking about losing essential services, data, or information, under extreme circumstances, for a prolonged period of time. Disaster recovery is becoming an increasingly important aspect of enterprise computing. As devices, systems, and networks become ever more complex, there are simply more things that can go wrong.