Data Center Business Continuance and Recovery

Maciej Bocian [email protected] Architecture Sales Manager and Virtualization, Central Europe CCIE#7785

Presentation_ID © 2009 , Inc. All rights reserved. Cisco Confidential 1 Business Continuance Drivers

• Cost of application downtime, lost data and productivity

• Reggyulatory mandates (Homeland Hurricanes Defense, Basel II, HIPAA, GLB, SEC) Firms must recover business operations the same business day a disruption occurs “Out-of-region” data center, 200+ km away Mandates data centers on separate The Northeast Blackout grids

NYC Blizzard of 2003

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 2 Business Continuance Is More Critical than Ever ƒ 75% of IT decision-makers have altered /Business Continuance programs as a result of September 11

ƒ Following a disaster 43% of directly affected businesses do not reopen and 29% fail within 24 months as a result

ƒ Only 15% of Global 2000 enterprises have a full- fledged business continuity plan.

ƒ : fire, storm, floods, earthquakes, chemical accidents, nuclear accidents, wars

Sources: Disaster Recovery Journal, Gartner Group

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 3 Agenda

ƒ Introduction to Data Center - The Evolution ƒ Data Center Disaster Recovery Objectives Failure Scenarios Design Options ƒ Components of Disaster Recovery Site Selection - Front End GSLB Server High Availability - Clustering DtRlitiData Replication an dShd Synchron itiization - SAN Ex tensi on ƒ Sample Design

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 4 The Evolution of Data Centers

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 5 Data Center Evolution

NETWORKED DATA CENTER PHASE Data Center Continuous Data Center Availability Distributed Data Center Network Consolidation Optimization COMPUTE EVOLUTION Computing Data Center Client/ Networking

Agility Server 1. Consolidation Mainframes 2. Integration Content 3. Distributed Networking 4. High Availability

Business Business Thin Client: HTTP

TCP/IP NETWORK Terminal EVOLUTION

1960 1980 2000 2010

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 6 What is involved in a Data Center

Network infrastructure solution Application solution Cisco GSRs, Linux/HP , CISCO CATALYST Solaris/SunFire, 6500, Cisco Catalyst WebLogic, J2EE Cat4000 custom app, etc.

Layer 4–7 services solution CSM, Database solution SSLM, Linux/HP, Solaris/ CSS, SunFire, Oracle CE, GSS 10G RAC, etc.

Network security solution PIX®, FWSM, IDSM, VPNSM, Storage so lu tion CSA MDS9000

Management and instrumentation solution Terminal servers, NAM, Cisco Works LMS/VMS, HSE

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 7 What is Distributed Data Center

APP A APP B APP A APP C

Data Replication

FC FC Primary Secondary Data Center Data Center

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 8 Why Distributed Data Centers

ƒ Provide disaster recovery and business continuance ƒ Avoid single, concentrated data depositary ƒ Higgyh availability of applications and data access ƒ Load balancing together with performance scalability ƒ Better response and optimal content routing: proximity to clients

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 9 Front-end IP Access Layer

“Content Routing”

APP A APP B site selection APP A APP C

FC FC Primary Secondary Data Center Data Center

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 10 Application and Database Lay er

“Content Switching ” Load Balancing APP A APP B “Server Clustering” APP A APP C High Availability

FC FC Primary Secondary Data Center Data Center

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 11 Backend SAN Extension

“Storage” & “Optical” APP A APP B Data APP A APP C Mirroogadepcatoring and Replication

FC FC

PiPrimary SdSecondary Data Center Data Center

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 12 Data Center Disaster Recovery

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 13 Agenda

ƒ Introduction to Data Center - The Evolution ƒ Data Center Disaster Recovery Objectives Failure Scenarios Design Options ƒ Components of Disaster Recovery Site Selection - Front End GSLB Server High Availability - Clustering DtRlitiData Replication an dShd Synchron itiization - SAN Ex tensi on ƒ Sample Design

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 14 Disaster Recovery

ƒ Recovery of data and resumption of service - Ensuring business can recover and continue after failure or disaster

ƒ Ability of a business to adapt, change and continue when confronted with various outside impacts

ƒ Mitigating the impact of a disaster

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 15 What It means For Business

Business Resilience Continued Operation of Business During a Failure Business Continuance Restoration of Business After a Failure Disaster Recovery Protectinggg Data Through Offsite Data Replication and Backup

Zero Down Time is the ultimate goal

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 16 Disaster Recovery Planning

• Business Impact Analysis (BIA) Determines the impacts of various disasters to specific business functions and company assets

• Risk Analysis Identifies important functions and assets that are critical to company’ s operations

• Disaster Recovery Plan (DRP) Restores operability of the target systems, applications, or computing facility at the secondary Data Center after the disaster

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 17 Disaster Recovery Objectives

ƒ Recovery Point Objective (RPO) The po in t in time (pr ior to the ou tage) i n w hic h sys tem an d da ta must be restored to Tolerable lost of data in event of disaster or failure The impact of and the cost associated with the loss ƒ Recovery Time Objective (RTO) The period of time after an outage in which the systems and data must be restored to the predetermined RPO The maximum tolerable outage time ƒ Recovery Access Objec tive (RAO) Time required to reconnect user to the recovered application, regardless where it is recovered

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 18 Recovery Point/Time vs. Cost

Critical data is Disaster Systems recovered recovered strikes and operational

time Recovery point Recovery time time t time t 0 1 time t2

days hours mins secs secs mins hours days weeks

Tape Periodic Asynchronous Synchronous Extended Manual Tape backup Replication Replication Replication Cluster Migration Restore

$$$ Increasing cost $$$ Increasing cost

ƒ Smaller RPO/RTO ƒ Larger RPO/RTO Higher $$$, Replication, Hot Lower $$$, Tape backup/restore, standby Cold stanby

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 19 Agenda

ƒ Introduction to Data Center - The Evolution ƒ Data Center Disaster Recovery Objectives Failure Scenarios Design Options ƒ Components of Disaster Recovery Site Selection - Front End GSLB Server High Availability - Clustering DtRlitiData Replication an dShd Synchron itiization - SAN Ex tensi on ƒ Sample Design

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 20 Failure Scenarios

Disaster could mean many types of Failure ƒ Network Failure ƒ DiDevice FilFailure ƒ Storage Failure ƒ Site Failure

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 21 Network Failures

Internet Service Service PidAProvider A Provider B

ƒ ISP failure 9 Dual ISP connections 9 Multiple ISP

ƒ Connection failure within the network 9 ether-channel 9 Multiple route paths

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 22 Device Failures

Internet Service Service Provider A Provider B

ƒ Routers, Switches, FWs 9 HSRP 9 VRRP

ƒ Hosts 9 HA cluster

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 23 Storage Failures

Internet Service Service PidAProvider A Provider B

ƒ Disk arrays 9 RAID

ƒ Disk Controllers

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 24 Site Failures

Internet Service Service PidAProvider A Provider B

ƒ Partial Site Failure 9 Application maintenance 9 Application migration 9 Application scheduled DR exercise

ƒ Complete Site Failure 9 Disaster

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 25 Agenda

ƒ Introduction to Data Center - The Evolution ƒ Data Center Disaster Recovery Objectives Failure Scenarios Design Options ƒ Components of Disaster Recovery Site Selection - Front End GSLB Server High Availability - Clustering DtRlitiData Replication an dShd Synchron itiization - SAN Ex tensi on ƒ Sample Design

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 26 Cold Standby

ƒ One or more data center with appropriately configured space equipped with pre-qualified environmental, electrical,,g and communication conditioning ƒ Hardware and Software installation, Network access, and data restoration all need manual intervention ƒ Least expensive to implement and maintain ƒ Substantial delay from standbyyp to full operation

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 27 Disaster Recovery – Active/Standby

APP A APP B APP A APP B

FC FC Primary Secondary Data Center Data Center (Cold Standby)

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 28 Warm Standby

ƒ A data center that is partially equipped with hardware and communications interfaces capable of providing backup oppgpperating support. ƒ Latest from the production data center must be delivered ƒ Network access needs to be activated ƒ Provides better RTO and RPO than Cold Standby Backup

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 29 Disaster Recovery – Active/Standby

APP A APP B APP A APP B

IP/Optical Network

FC Secondary FC Primary Data Center Data Center (Warm Standby)

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 30 Hot Standby

ƒ A data center that is environmentally ready and has sufficient hardware, software to provide data processing service with little down or no down time. ƒ Hot Backup offers Disaster Recovery, with little or no human intervention ƒ Appli ca tion da ta is rep lica te d from the pr imary s ite ƒ A hot backup site provides very good RTO and RPO

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 31 Disaster Recovery – Active/Standby

APP A APP B APP A APP C

IP/Optical Network

FC FC Primary Secondary Data Center Data Center

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 32 Disaster Recovery – Active/Active

What Does Active/Active Mean??

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 33 Multiple Tiers of Application

Internet Service Service PidAProvider A Provider B

Presentation Tier

Application Tier

Storage Tier

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 34 Active/Active Data Centers

Internal Internet Network Service Service Provider A Provider B Internal Network

Active/Active Web Hosting

Active/Active Application Processing

Active/Standby Database Processing Or Active/Active

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 35 Disaster Recovery Components

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 36 Agenda

ƒ Introduction to Data Center - The Evolution ƒ Data Center Disaster Recovery Objectives Failure Scenarios Design Options ƒ Components of Disaster Recovery Site Selection - Front End GSLB Server High Availability - Clustering DtRlitiData Replication an dShd Synchron itiization - SAN Ex tensi on ƒ Sample Design

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 37 Site Selection Mechanisms

ƒ Site selection mechanisms depend on the technology or mix of technologies adopted for request routing: 1. HTTP Redirect 2. DNS Based 3. L3 Routing with Route Health Injection (RHI) ƒ HlthfHealth of servers an d/or app litilications needtbds to be taken into account ƒ Optionally, other metrics (like load ) can be measured and utilized for a better selection

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 38 HTTP Redirection – The Idea

ƒ Leveraging the HTTP redirect function: HTTP return code 302 ƒ Proper site selection made after the initial DNS request has been resolved, via redirection ƒ Mainly as a method of providing site persistence while providing local server farm failure recovery ƒ Can be used with the “Location Cookie” feature of the CSS to provide redirection after wrong site selection

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 39 HTTP Redirection – Traffic Flow

http://www.cisco.com/

http://www1.cisco.com/

http://www2.cisco.com/

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 40 Advantages of the HTTP Redirection Approach

ƒ Can be implemented without any other GSLB devices or mechanisms ƒ Inherent persistence to the selected location ƒ Can be used in conjunction with other methods to provide more sophisticated site selection

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 41 Limitations of the HTTP Redirection Approach

ƒ It is protocol specific – relies on HTTP ƒ Reqqyquires redirection to fully qualified additional names – additional DNS records ƒ UbkkifiliUsers may bookmark a specific location – losing automatic failover ƒ HTTPS redirect requires full SSL hand shake to be completed first

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 42 DNS-Based Site Selection – The Idea

ƒ The client D-proxy (local name server) performs iterative queries ƒ The device which acts as “site selector” is the authoritative name server for the domain(s) distributed in multiple locations ƒ The “site selector” sends keepalives to servers or server ldblload balancer ithlin the loca l an d remo tlte loca tions ƒ The “site selector” selects a site for the name resolution, according to the pre-defined answers and site load balance method ƒ The user traffic is sent to the selected location

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 43 DNS-Based Site Selection – Traffic Flow

Root Name Server for/ Authoritative Name Server for .com DNS Proxy 2 3 4 Authoritative Name Server cisco.com 5 1 6 10 7 8 Client 9 Authoritative Name Server http://www.cisco.com/ www.cisco.com UDP:53 TCP:80

Data Center 1 Data Center 2

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 44 Advantages of the DNS Approach

ƒ Protocol indeppyendent: works with any application that uses name resolution ƒ Minimal configuration changes in the current IP and DNS infrastructure (DNS authoritative server) ƒ Implementation can be different for specific host names ƒ A-records can be changed on the fly ƒ Can take load or data center size into account ƒ Can provide proximity

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 45 Limitations of the DNS -Based Approach

ƒ Visibility limited to the D-proxy (not the client) ƒ Can not guarantee 100% session persistency ƒ DNS caching in the D-proxy ƒ DNS caching in the client application ƒ Order of multiple A-record answers can be altered by D-proxies

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 46 Route Health Injection – The Idea

ƒ Server and application health monitoring provided by local Server Load Balancers ƒ SLB can advertise or with draw VIP address to upstream routing devices depending on the availability of the local server farm ƒ Same VIP a ddresses can b e ad ver tise d from mu ltip le data centers – IP Anycast ƒ Relying on L3 routing protocols for route propagating and content request routing ƒ Disaster Recoveryyp provided b y network convergence

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 47 Route Health Injection – Implementation

Client A Router 11 Client B Router 13

Router 10

Router 12 Low Cost Very High Cost Location A Backup Location for Location B VIP x.y.w.z Preferred Location for VIP x.y.w.z

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 48 Advantages of the RHI Approach

ƒ Supports legacy application and does not rely on a DNS infrastructure ƒ Veryyg good re-convergence time, especially in Intranets where L3 protocols can be fine tuned appropriately ƒ PtProtocol -idindepen den t: wor ks w ith any application ƒ Robust protocols and proven features

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 49 Limitations of the RHI Approach

ƒ Relies on host routes (32 bits) , which cannot be propagated all over the internet (more on this later) ƒ Requires tight integration between the application-aware devices and the L3 routers ƒ Inability to intelligently load balance among the data centers

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 50 Agenda

ƒ Introduction to Data Center - The Evolution ƒ Data Center Disaster Recovery Objectives Failure Scenarios Design Options ƒ Components of Disaster Recovery Site Selection - Front End GSLB Server High Availability - Clustering DtRlitiData Replication an dShd Synchron itiization - SAN Ex tensi on ƒ Sample Design

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 51 Cluster Overview

ƒ A cluster is two or more servers configured to appear as one ƒ Two types of clustering: Load balancing (LB) and High

Availability (HA) Web Servers ƒ Clustering provides benefits for availability, reliability, scalability, and manageability ƒ LB cl ust eri ng: multi p le cop ies o f Application Servers the same application against the same data set, usually read only ƒ HA clustering: multiple copies of long running application that Database Servers requires access to a common data depository, usually read and write

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 52 HA Cluster Connections

ƒ Public Network (typically Ethernet)Ethernet) forfor cclientlient //ApplicationApplication requests ƒ Servers with same hardware, OS, and application software ƒ Private Network (typically Ethernet) for interconnection between nodes. Could be direct connecttillit, or optionally going through the public network ƒ Storage Disk (typically Fiber) shared storage array, NAS or SAN

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 53 Typical HA Cluster Components

ƒ Application software that are clustered to ppgrovide High Availability. Example: Microsoft Exchange, SQL, Oracle database, File and Print Services ƒ Operating System that runs on the server hardware. Examp le: Microsoft Win dows 2000 or 2003, Linux ( and th e other flavors of UNIX), IBM VMS or z/OS (for mainframe) ƒ Cluster Software that provides the HA clustering service for the application. Example: Microsoft MSCS, EMC AutoStart (Legato), Veritas Cluster Server, HP TruCluster and OpenVMS ƒ Optionally, Cluster Enabler, a software that synchronizes the cluster software with the storage disk array software

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 54 Basic HA Cluster Design

ƒ Active/Standby: – Active node takes client requests and writing to the data – Standby takes over when detecting failure on active – Two-node or multi-node node1 node2 ƒ Active/Active: – Database requests load balanced to both nodes – Lock mechanism ensures data integrity – Most scalable design

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 55 File System Approaches for HA Clusters

ƒ Shared Everyygthing – Equal access to all storage – Each node mounts all storage resources – Provides a single layout reference system for all nodes – Changes updated in the layout reference ƒ Shared Nothing – Traditional file system with peer-peer communication – Each node mounts only its “semi-private” storage – Data stored on the ppyeer system’s stora ge is accessed via the peer- peer communication – Failed node’s storage needs to be mounted by the peer

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 56 Geo-clusters

Geo-cluster: cluster that span multiple data centers

WAN

Local Remote Datacenter Datacenter

node1 node2

Disk Replication Synchronous or Asynchronous 2 x RTT

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 57 Considerations for HA Clusters

ƒ Split Brain: Cluster partitioning when nodes can not communicate with each other but are equally capable of forming a cluster and mount disks. ƒ Extended L2 required in most implementations for: – Public Network, since client only knows about the Virtual IP address – Private Network, used for Heart-beats ƒ Storage: – Directly Attached Disk (DAS) cannot be used – Shared Disk needs to be visible to both Nodes – Needs to interface with cluster software for disk failover, zoning, LUN masking when there is a node failure

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 58 Split-Brain

ƒ Split-brain happens when all of the network communication links between two or more cluster nodes fail. ƒ Both nodes could potentially go active, and concurrently access the node1 node2 di sk , thus co rr upting data

Data Corruption

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 59 Resolution for Split Brain: Quorum

ƒ A quorum device serves as a tie breaker to arbitrate which system has access to resources. ƒ The quorum ensures that even if there is no communication between the nodes, only one node can continue to node1 node2 access the disk. ƒ Only the node that owns the quorum (or, majority quorum votes) can bring resources online. ƒ Any resource can be used as the arbitrator to break the tie.

quorum

Application data

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 60 Extended Layer 2 Network

ƒ In most implementation, WAN a common L2 ne twor k is needed for the heartbeat Local Remote between the nodes, as Datacenter Datacenter well as public client access Public Layer 2 network ƒ Extending VLAN on a geographical basis is not node1 node2 considere d bes t prac tice Private Layer 2 network because of the impact of broadcasts, multicast, floodinggg and Spanning- Tree integration issues

Disk Replication: Synchronous or Asynchronous

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 61 Resolution: L3 Routed Solution

ƒ In certain cases a L3 routed solution is possible 11. 20. 5. x 172.28.210.x ƒ Microsoft MSCS – Requires that 2 nodes be on the node2 same subnet. node1 – The commun ica tion be tween the 2 nodes is UDP unicast – Local Area Mobility (LAM) allows the placement of the nodes on 2 different subnets ƒ Veritas VCS Extended SAN – Allows having nodes with IP addresses in different subnets – The Virtual Address needs to change when moving from node1 to node2 – DNS can be used to provide name- multiple IP mapping Disk Replication: Synchronous or Asynchronous

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 62 Storage Disk Zoning

node1 node2 ƒ What storaggye disk array should node 2 be zoned to active standby before and after a failure on node 1 ƒ To complete the failover you need to change the zoning Extended SAN configuration ƒ Software needed to synchronize the Cluster Software with the Disk Array’s software, i.e. Cluster Enabler sym1320 sym1291

RW RD

RW RD

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 63 Resolution: Cluster Enabler

ƒ The Cluster Enabler (CE) provides node1 node2 the interface between the Clustering Software and the Disk standby Array’s software active ƒ When the Clustering Software detects a failure and wants to fail the node, the Cluster Enabler instructs the Disk Array to perform an failover Extended SAN ƒ Cluster Enabler also allows node1 to be zoned to sym1320 and node2 to be zoned to 1291 ƒ The Cluster Enabler running on each node typically communicates sym1320 sym1291 with the Cluster Enabler Software running on the remote node with Local Multicast messages RW WD

RW WD

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 64 Agenda

ƒ Introduction to Data Center - The Evolution ƒ Data Center Disaster Recovery Objectives Failure Scenarios Design Options ƒ Components of Disaster Recovery Site Selection - Front End GSLB Server High Availability - Clustering DtRlitiData Replication an dShd Synchron itiization - SAN Ex tensi on ƒ Sample Design

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 65 Terminology

ƒ Storage subsystem Just a bunch of disks (JBOD) Redundant array of independent disks (RAID) ƒ Storage I/O devices Host Bus Adapter (HBA) Small Compp()uter Serial Interface (SCSI) ƒ Storage protocols SCSI iSCSI FC (FCIP)

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 66 Terminology (Cont ’d)

ƒ Direct Attached Storage (DAS) Storage i s “loca l” bhidthbehind the server No storage sharing possible Costly to scale; complex to manage ƒ Network Attached Storage (NAS) Storage is accessed at a file level over an IP network Storage can b e sh are d be tween servers ƒ Storage Area Networks (SAN) Storage is accessed at a block-level Separation of Storage from the Server High performance interconnect providing high I/O throughput

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 67 Storage for Applications ƒ Presentation Tier Unrelated small data files commonly stored on internal disks Manual distribution ƒ Application Processing Tier Transitional, unrelated data Small files residing on file systems Mayyp use RAID to spread data over multi ple disks ƒ Storage Tier Large, permanent data files or raw data Large batch updates, most likely Real time Log and data on separate volumes

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 68 Backup and Replication

ƒ Offsite tape vaulting Backup tapes stored at offsite location ƒ Electronic vaulting Transmission of backup data to offsite location ƒ Remote disk replication Continuous copying of data to offsite location Transparent to host ƒ Other methods of replication Host-based mirroring Network-based replication

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 69 Replication: Modes of Operation

ƒ Synchronous All data written to cache of local and remote arrays before I/O is complete and acknowledged to host ƒ Asynchronous Write acknowledged after write to local array cache; changes (writes) are replicated to remote array asynchronously ƒ Semi-synchronous Write acknowledggged with a single subsequent WRITE command pending from remote array

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 70 Synchronous Vs. Asynchronous Trade- Off

Synchronous Asynchronous Impact to Application No Application Performance Performance Impact Distance Limited (Are Both Unlimited Distance (Second Sites within the Same Site Outside Threat Radius) Threat Radius) Exposure to No Data Loss Possible Data Loss

Enterprises Must Evaluate the Trade-Offs

ƒ Maximum tolerable distance ascertained by assessing each application ƒ Cost of data loss

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 71 Data Replication with DB Example

Control Files ƒ Control Files identify other files • DB name making up the database and • creation date records content and state of • backup performed the db. • redo log time period • datafile state ƒ Datafile is onlyyp updated periodically ƒ Redo logs record db changes Identify resulting from transactions UdtlUsed to play bkhback changes thtthat may not have been written to datafile when failure occurred Typically archived as they fill to local and DR site destinations Record Datafiles changes to Redo Log Files

• Tablespaces • Database changes • Indexes • Data Dictionary

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 72 Data Replication with DB Example (Cont’d)

Failure or disaster occurs at time t1 • Media Failure (e .g . disk) time • Human Error (datafile deletion) • Database Corruption

......

Archived Redo Logs t1 t Online Redo 0 Logs

ƒ Database restored to state at time of failure (time t1) Hot Backup of Datafiles and by: Control Files taken at Time t0 1. Restoring Control Files & Datafiles from last Hot Backup (time t0) 2. Sequentially replaying changes from subsequent Redo Logs (archived and online) – changes made between time t0 and t1

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 73 Data Replication with DB Example (Cont’d)

Primary Site Secondary Site Redo Logs (Cyclic) Redo Logs (Cyclic) Copy of Every Committed Transaction Synchronously Replicated Earlier DB for Zero Loss Backups

Database

SAN EtExtensi on Database Database Transport Copy at copy at Time t0 Point in Time time t0 Copy Taken When DB Replicated/Copied Quiescent

Archive Logs Archive Logs Replicated/Copied

ƒ Mixture of sync and async replication technologies commonly used Usually only redo logs sync replicated to remote site Archive logggpgs created from redo log and copied when redo log switches Point in time (PiT) copies of datafiles and control files copied periodically (e.g. nightly)

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 74 Data Center Interconnection Options

Internet Internet Stateful Stateful Firewalls Firewalls Content CttContent Caching Caching SONET/SDH High High Density Density Multilayer Server Server Load Balancing Load Balancing Multilayer LAN LAN Switch Switch Intrusion Intrusion Detection Detection

Front-End Application Front-End Application Servers Servers

DWDM/ CWDM Back-End Application Back-End Application Servers Servers

High High Density Density Multilayer Multilayer SAN SAN Director Director

Enterprise-Class Storage Arrays Enterprise-Class storage Arrays IP/Metro E

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 75 Data Center Transport Options

Increasing Distance Data Center Campus Metro Regional National

Dark Fiber Sync Limited by Optics (Power Budget)

CWDM Sync (2Gbps) Limited by Optics (Power Budget) al cc DWDM Sync (2Gbps lambda) Limited by BB_Credits Opti SONET/SDH Sync (1Gbps+ subrate) Async

IP MDS9000 FCIP Sync (Metro Eth) Async (1Gbps+)

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 76 Data Center Replication with SAN Extension

Share d Data ƒ Extend the normal reach of Cluster or Remote Host a Fibre Channel fabric Access to Storage Replication Remote host to target array Shared data clusters SAN Extension Network

FC FC Replication

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 77 SAN Design for Data Replication

Server Site A Access ƒ Servers with two fibre channel connections to FC Replication Fabrics storage arrays for high availability Use of multipath software is required in dual fabric host design DC Interconnect Network ƒ SAN extension fabrics typically separate from host access fabrics Replication fabric

FC Replication requirements generally fabrics specified by array vendor Site B

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 78 Data Center Disaster Recovery sample design

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 79 Disaster Impact Radius

Global

Regional < 400km

Secondary Primary DR Site Data Center DtData C en ter

Metro < 50km

ƒ Disasters are characterized by Local their impact 1–2 km Local, metro, regional, global Fire, flood, earthquake, attack ƒ Is the backup site within the threat radius?

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 80 Active/Standby Architecture - Today

CA CA NC High Availability Site 1 High Availability Site 2 Disaster Recovery Site

Hosts 1 Hosts 2 Hosts 3

HA Cluster(s) Electronic Journaling

Synch CWDM MDS 9509’s Replication MDS 9509’s MDS 9509’s

Dual OC12

Synch FCIP Asynchronous Replication FCIP Replication

MDS 9509 MDS 9509 MDS 9509 Gateway Gateway Gateway

Storage 1 Storage 2Bunker Storage 3

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 81 Frame Based Replication Data Center 1 Data Center 2 Production Cluster D/R

MDS DUAL OC12 MDS

SRDF R2 BCV/R1 BCV Timefinder Timefinder PiT SRDF/A PROD D/R PiT SRDF/A PiT Redo SRDF/A Redo PiT Arch Arch Triple Threat EMC/DMX EMC/DMX EMC/DMX

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 82 Active/Active Architecture - Tomorrow

Service Locator Group Data Centers ACE ACNS ACE User decrypts caches routes request pages request Clustered DC2 Backend YActiveY Active StandbyActive Active X Standby Data X Data Y Content Engine GSS performs Site (DC) selection according to pre-configured condition, using ACE Requests FQDN probes directed to tktrack bkbackup application application health

Mirror Presentation Layer Asynchronous Replication DC1 Requests directed to primary application

Clustered Active Standby Data X Active Backend Data Y X Active Y Standby

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 83 SANTap and Continuous Data Protection

• SANTap Production Servers • Appliance based storage replication • Reliable copy of WRITE operations • SCSI-FCIP communication

• Continuous Data Protection • Automatic and Continuous Backups CDP • Time Addressable Storage (TAS) Appliance • Any Point-in-Time Recovery SAN Tap • Application based or Network based MDS SAN

Primary Secondary

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 84 Fabric Based Replication with CDP Data Center 1 Data Center 2 Production Cluster D/R

Replication/CDP Replication/CDP Appliance Appliance SANTap DUAL OC12

MDS MDS

D/R APiT APiT APiT SRDF/A PROD BCV SRDF/A Redo Redo APiT APiT SRDF/A APiT Arch Arch

EMC/DMX TAS/SATA TAS/SATA EMC/DMX

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 85 End-End Data Center Resilience

Corp. DNS

GSS-1 GSS-2

ACE-1 ACE-2 ACE-3

DC-1 DC-2 DC-3

Web/APP Server Farm

DB IP/Optical Network CWDM/DWDM

FC FC FC Primary Secondary Location Location

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 86 Summary - Design Details

ƒ Data centers 1 and 2 are in primary location with close enough distance that can provide DC HA for active/active access ƒ Data Center 3 (DR) with > tolerable disaster radius, away for Primary DC 1 and 2 ƒ Web/App server farms are load balanced geographically ƒ DB servers are within a geo-HA cluster and running in a L3 design ƒ Syypnchronize Data replication between data centers within the primary location ƒ Asynchronous Data replication is done between the primary and secondary storage systems

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 87 Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 88