Data Center Business Continuance and Disaster Recovery

Maciej Bocian [email protected] Architecture Sales Manager Data Center and Virtualization, Central Europe CCIE#7785

• Cost of application downtime, lost data and productivity

• Reggyulatory mandates (Homeland Hurricanes Defense, Basel II, HIPAA, GLB, SEC) Firms must recover business operations the same business day a disruption occurs “Out-of-region” data center, 200+ km away Mandates backup data centers on separate The Northeast Blackout grids

NYC Blizzard of 2003

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 2 Business Continuance Is More Critical than Ever 75% of IT decision-makers have altered Disaster Recovery/Business Continuance programs as a result of September 11

Following a disaster 43% of directly affected businesses do not reopen and 29% fail within 24 months as a result

Only 15% of Global 2000 enterprises have a full- fledged business continuity plan.

Disasters: fire, storm, floods, earthquakes, chemical accidents, nuclear accidents, wars

Sources: Disaster Recovery Journal, Gartner Group

Introduction to Data Center - The Evolution Data Center Disaster Recovery Objectives Failure Scenarios Design Options Components of Disaster Recovery Site Selection - Front End GSLB Server High Availability - Clustering DtRlitiData Replication an dShd Synchron itiization - SAN Ex tensi on Sample Design

NETWORKED DATA CENTER PHASE Data Center Continuous Data Center Availability Distributed Data Center Network Consolidation Optimization COMPUTE Internet EVOLUTION Computing Data Center Client/ Networking

Agility Server 1. Consolidation Mainframes 2. Integration Content 3. Distributed Networking 4. High Availability

Business Business Thin Client: HTTP

TCP/IP NETWORK Terminal EVOLUTION

1960 1980 2000 2010

Network infrastructure solution Application solution Cisco GSRs, Linux/HP , CISCO CATALYST Solaris/SunFire, 6500, Cisco Catalyst WebLogic, J2EE Cat4000 custom app, etc.

Layer 4–7 services solution CSM, Database solution SSLM, Linux/HP, Solaris/ CSS, SunFire, Oracle CE, GSS 10G RAC, etc.

Network security solution PIX®, FWSM, IDSM, VPNSM, Storage so lu tion CSA MDS9000

Management and instrumentation solution Terminal servers, NAM, Cisco Works LMS/VMS, HSE

APP A APP B APP A APP C

Data Replication

FC FC Primary Secondary Data Center Data Center

Provide disaster recovery and business continuance Avoid single, concentrated data depositary Higgyh availability of applications and data access Load balancing together with performance scalability Better response and optimal content routing: proximity to clients

“Content Routing”

APP A APP B site selection APP A APP C

FC FC Primary Secondary Data Center Data Center

“Content Switching ” Load Balancing APP A APP B “Server Clustering” APP A APP C High Availability

FC FC Primary Secondary Data Center Data Center

“Storage” & “Optical” APP A APP B Data APP A APP C Mirroogadepcatoring and Replication

FC FC

PiPrimary SdSecondary Data Center Data Center

Recovery of data and resumption of service - Ensuring business can recover and continue after failure or disaster

Ability of a business to adapt, change and continue when confronted with various outside impacts

Mitigating the impact of a disaster

Business Resilience Continued Operation of Business During a Failure Business Continuance Restoration of Business After a Failure Disaster Recovery Protectinggg Data Through Offsite Data Replication and Backup

Zero Down Time is the ultimate goal

• Business Impact Analysis (BIA) Determines the impacts of various disasters to specific business functions and company assets

• Risk Analysis Identifies important functions and assets that are critical to company’ s operations

• Disaster Recovery Plan (DRP) Restores operability of the target systems, applications, or computing facility at the secondary Data Center after the disaster

Recovery Point Objective (RPO) The po in t in time (pr ior to the ou tage) i n w hic h sys tem an d da ta must be restored to Tolerable lost of data in event of disaster or failure The impact of data loss and the cost associated with the loss Recovery Time Objective (RTO) The period of time after an outage in which the systems and data must be restored to the predetermined RPO The maximum tolerable outage time Recovery Access Objec tive (RAO) Time required to reconnect user to the recovered application, regardless where it is recovered

Critical data is Disaster Systems recovered recovered strikes and operational

time Recovery point Recovery time time t time t 0 1 time t2

days hours mins secs secs mins hours days weeks

Tape Periodic Asynchronous Synchronous Extended Manual Tape backup Replication Replication Replication Cluster Migration Restore

$$$ Increasing cost $$$ Increasing cost

Smaller RPO/RTO Larger RPO/RTO Higher $$$, Replication, Hot Lower $$$, Tape backup/restore, standby Cold stanby

Disaster could mean many types of Failure Network Failure DiDevice FilFailure Storage Failure Site Failure

Internet Service Service PidAProvider A Provider B

ISP failure 9 Dual ISP connections 9 Multiple ISP

Connection failure within the network 9 ether-channel 9 Multiple route paths

Internet Service Service Provider A Provider B

Routers, Switches, FWs 9 HSRP 9 VRRP

Hosts 9 HA cluster

Internet Service Service PidAProvider A Provider B

Disk arrays 9 RAID

Disk Controllers

Internet Service Service PidAProvider A Provider B

Partial Site Failure 9 Application maintenance 9 Application migration 9 Application scheduled DR exercise

Complete Site Failure 9 Disaster

One or more data center with appropriately configured space equipped with pre-qualified environmental, electrical,,g and communication conditioning Hardware and Software installation, Network access, and data restoration all need manual intervention Least expensive to implement and maintain Substantial delay from standbyyp to full operation

APP A APP B APP A APP B

FC FC Primary Secondary Data Center Data Center (Cold Standby)

A data center that is partially equipped with hardware and communications interfaces capable of providing backup oppgpperating support. Latest backups from the production data center must be delivered Network access needs to be activated Provides better RTO and RPO than Cold Standby Backup

APP A APP B APP A APP B

IP/Optical Network

FC Secondary FC Primary Data Center Data Center (Warm Standby)

A data center that is environmentally ready and has sufficient hardware, software to provide data processing service with little down or no down time. Hot Backup offers Disaster Recovery, with little or no human intervention Appli ca tion da ta is rep lica te d from the pr imary s ite A hot backup site provides very good RTO and RPO

APP A APP B APP A APP C

IP/Optical Network

FC FC Primary Secondary Data Center Data Center

What Does Active/Active Mean??

Internet Service Service PidAProvider A Provider B

Presentation Tier

Application Tier

Storage Tier

Internal Internet Network Service Service Provider A Provider B Internal Network

Active/Active Web Hosting

Active/Active Application Processing

Active/Standby Database Processing Or Active/Active

Site selection mechanisms depend on the technology or mix of technologies adopted for request routing: 1. HTTP Redirect 2. DNS Based 3. L3 Routing with Route Health Injection (RHI) HlthfHealth of servers an d/or app litilications needtbds to be taken into account Optionally, other metrics (like load ) can be measured and utilized for a better selection

Leveraging the HTTP redirect function: HTTP return code 302 Proper site selection made after the initial DNS request has been resolved, via redirection Mainly as a method of providing site persistence while providing local server farm failure recovery Can be used with the “Location Cookie” feature of the CSS to provide redirection after wrong site selection

http://www.cisco.com/

http://www1.cisco.com/

http://www2.cisco.com/

Can be implemented without any other GSLB devices or mechanisms Inherent persistence to the selected location Can be used in conjunction with other methods to provide more sophisticated site selection

It is protocol specific – relies on HTTP Reqqyquires redirection to fully qualified additional names – additional DNS records UbkkifiliUsers may bookmark a specific location – losing automatic failover HTTPS redirect requires full SSL hand shake to be completed first

The client D-proxy (local name server) performs iterative queries The device which acts as “site selector” is the authoritative name server for the domain(s) distributed in multiple locations The “site selector” sends keepalives to servers or server ldblload balancer ithlin the loca l an d remo tlte loca tions The “site selector” selects a site for the name resolution, according to the pre-defined answers and site load balance method The user traffic is sent to the selected location

Root Name Server for/ Authoritative Name Server for .com DNS Proxy 2 3 4 Authoritative Name Server cisco.com 5 1 6 10 7 8 Client 9 Authoritative Name Server http://www.cisco.com/ www.cisco.com UDP:53 TCP:80

Data Center 1 Data Center 2

Protocol indeppyendent: works with any application that uses name resolution Minimal configuration changes in the current IP and DNS infrastructure (DNS authoritative server) Implementation can be different for specific host names A-records can be changed on the fly Can take load or data center size into account Can provide proximity

Visibility limited to the D-proxy (not the client) Can not guarantee 100% session persistency DNS caching in the D-proxy DNS caching in the client application Order of multiple A-record answers can be altered by D-proxies

Server and application health monitoring provided by local Server Load Balancers SLB can advertise or with draw VIP address to upstream routing devices depending on the availability of the local server farm Same VIP a ddresses can b e ad ver tise d from mu ltip le data centers – IP Anycast Relying on L3 routing protocols for route propagating and content request routing Disaster Recoveryyp provided b y network convergence

Client A Router 11 Client B Router 13

Router 10

Router 12 Low Cost Very High Cost Location A Backup Location for Location B VIP x.y.w.z Preferred Location for VIP x.y.w.z

Supports legacy application and does not rely on a DNS infrastructure Veryyg good re-convergence time, especially in Intranets where L3 protocols can be fine tuned appropriately PtProtocol -idindepen den t: wor ks w ith any application Robust protocols and proven features

Relies on host routes (32 bits) , which cannot be propagated all over the internet (more on this later) Requires tight integration between the application-aware devices and the L3 routers Inability to intelligently load balance among the data centers

A cluster is two or more servers configured to appear as one Two types of clustering: Load balancing (LB) and High

Availability (HA) Web Servers Clustering provides benefits for availability, reliability, scalability, and manageability LB cl ust eri ng: multi p le cop ies o f Application Servers the same application against the same data set, usually read only HA clustering: multiple copies of long running application that Database Servers requires access to a common data depository, usually read and write

Public Network (typically Ethernet)Ethernet) forfor cclientlient //ApplicationApplication requests Servers with same hardware, OS, and application software Private Network (typically Ethernet) for interconnection between nodes. Could be direct connecttillit, or optionally going through the public network Storage Disk (typically Fiber) shared storage array, NAS or SAN

Application software that are clustered to ppgrovide High Availability. Example: Microsoft Exchange, SQL, Oracle database, File and Print Services Operating System that runs on the server hardware. Examp le: Microsoft Win dows 2000 or 2003, Linux ( and th e other flavors of UNIX), IBM VMS or z/OS (for mainframe) Cluster Software that provides the HA clustering service for the application. Example: Microsoft MSCS, EMC AutoStart (Legato), Veritas Cluster Server, HP TruCluster and OpenVMS Optionally, Cluster Enabler, a software that synchronizes the cluster software with the storage disk array software

Active/Standby: – Active node takes client requests and writing to the data – Standby takes over when detecting failure on active – Two-node or multi-node node1 node2 Active/Active: – Database requests load balanced to both nodes – Lock mechanism ensures data integrity – Most scalable design

Shared Everyygthing – Equal access to all storage – Each node mounts all storage resources – Provides a single layout reference system for all nodes – Changes updated in the layout reference Shared Nothing – Traditional file system with peer-peer communication – Each node mounts only its “semi-private” storage – Data stored on the ppyeer system’s stora ge is accessed via the peer- peer communication – Failed node’s storage needs to be mounted by the peer

Geo-cluster: cluster that span multiple data centers

WAN

Local Remote Datacenter Datacenter

node1 node2

Disk Replication Synchronous or Asynchronous 2 x RTT

Split Brain: Cluster partitioning when nodes can not communicate with each other but are equally capable of forming a cluster and mount disks. Extended L2 required in most implementations for: – Public Network, since client only knows about the Virtual IP address – Private Network, used for Heart-beats Storage: – Directly Attached Disk (DAS) cannot be used – Shared Disk needs to be visible to both Nodes – Needs to interface with cluster software for disk failover, zoning, LUN masking when there is a node failure

Split-brain happens when all of the network communication links between two or more cluster nodes fail. Both nodes could potentially go active, and concurrently access the node1 node2 di sk , thus co rr upting data

Data Corruption

A quorum device serves as a tie breaker to arbitrate which system has access to resources. The quorum ensures that even if there is no communication between the nodes, only one node can continue to node1 node2 access the disk. Only the node that owns the quorum (or, majority quorum votes) can bring resources online. Any resource can be used as the arbitrator to break the tie.

quorum

Application data

In most implementation, WAN a common L2 ne twor k is needed for the heartbeat Local Remote between the nodes, as Datacenter Datacenter well as public client access Public Layer 2 network Extending VLAN on a geographical basis is not node1 node2 considere d bes t prac tice Private Layer 2 network because of the impact of broadcasts, multicast, floodinggg and Spanning- Tree integration issues

Disk Replication: Synchronous or Asynchronous

In certain cases a L3 routed solution is possible 11. 20. 5. x 172.28.210.x Microsoft MSCS – Requires that 2 nodes be on the node2 same subnet. node1 – The commun ica tion be tween the 2 nodes is UDP unicast – Local Area Mobility (LAM) allows the placement of the nodes on 2 different subnets Veritas VCS Extended SAN – Allows having nodes with IP addresses in different subnets – The Virtual Address needs to change when moving from node1 to node2 – DNS can be used to provide name- multiple IP mapping Disk Replication: Synchronous or Asynchronous

node1 node2 What storaggye disk array should node 2 be zoned to active standby before and after a failure on node 1 To complete the failover you need to change the zoning Extended SAN configuration Software needed to synchronize the Cluster Software with the Disk Array’s software, i.e. Cluster Enabler sym1320 sym1291

RW RD

The Cluster Enabler (CE) provides node1 node2 the interface between the Clustering Software and the Disk standby Array’s software active When the Clustering Software detects a failure and wants to fail the node, the Cluster Enabler instructs the Disk Array to perform an failover Extended SAN Cluster Enabler also allows node1 to be zoned to sym1320 and node2 to be zoned to 1291 The Cluster Enabler running on each node typically communicates sym1320 sym1291 with the Cluster Enabler Software running on the remote node with Local Multicast messages RW WD

RW WD

Storage subsystem Just a bunch of disks (JBOD) Redundant array of independent disks (RAID) Storage I/O devices Host Bus Adapter (HBA) Small Compp()uter Serial Interface (SCSI) Storage protocols SCSI iSCSI FC (FCIP)

Direct Attached Storage (DAS) Storage i s “loca l” bhidthbehind the server No storage sharing possible Costly to scale; complex to manage Network Attached Storage (NAS) Storage is accessed at a file level over an IP network Storage can b e sh are d be tween servers Storage Area Networks (SAN) Storage is accessed at a block-level Separation of Storage from the Server High performance interconnect providing high I/O throughput

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 67 Storage for Applications Presentation Tier Unrelated small data files commonly stored on internal disks Manual distribution Application Processing Tier Transitional, unrelated data Small files residing on file systems Mayyp use RAID to spread data over multi ple disks Storage Tier Large, permanent data files or raw data Large batch updates, most likely Real time Log and data on separate volumes

Offsite tape vaulting Backup tapes stored at offsite location Electronic vaulting Transmission of backup data to offsite location Remote disk replication Continuous copying of data to offsite location Transparent to host Other methods of replication Host-based mirroring Network-based replication

Synchronous All data written to cache of local and remote arrays before I/O is complete and acknowledged to host Asynchronous Write acknowledged after write to local array cache; changes (writes) are replicated to remote array asynchronously Semi-synchronous Write acknowledggged with a single subsequent WRITE command pending from remote array

Synchronous Asynchronous Impact to Application No Application Performance Performance Impact Distance Limited (Are Both Unlimited Distance (Second Sites within the Same Site Outside Threat Radius) Threat Radius) Exposure to No Data Loss Possible Data Loss

Enterprises Must Evaluate the Trade-Offs

Maximum tolerable distance ascertained by assessing each application Cost of data loss

Control Files Control Files identify other files • DB name making up the database and • creation date records content and state of • backup performed the db. • redo log time period • datafile state Datafile is onlyyp updated periodically Redo logs record db changes Identify resulting from transactions UdtlUsed to play bkhback changes thtthat may not have been written to datafile when failure occurred Typically archived as they fill to local and DR site destinations Record Datafiles changes to Redo Log Files

• Tablespaces • Database changes • Indexes • Data Dictionary

Failure or disaster occurs at time t1 • Media Failure (e .g . disk) time • Human Error (datafile deletion) • Database Corruption

......

Archived Redo Logs t1 t Online Redo 0 Logs

Database restored to state at time of failure (time t1) Hot Backup of Datafiles and by: Control Files taken at Time t0 1. Restoring Control Files & Datafiles from last Hot Backup (time t0) 2. Sequentially replaying changes from subsequent Redo Logs (archived and online) – changes made between time t0 and t1

Primary Site Secondary Site Redo Logs (Cyclic) Redo Logs (Cyclic) Copy of Every Committed Transaction Synchronously Replicated Earlier DB for Zero Loss Backups

Database

SAN EtExtensi on Database Database Transport Copy at copy at Time t0 Point in Time time t0 Copy Taken When DB Replicated/Copied Quiescent

Archive Logs Archive Logs Replicated/Copied

Mixture of sync and async replication technologies commonly used Usually only redo logs sync replicated to remote site Archive logggpgs created from redo log and copied when redo log switches Point in time (PiT) copies of datafiles and control files copied periodically (e.g. nightly)

Internet Internet Stateful Stateful Firewalls Firewalls Content CttContent Caching Caching SONET/SDH High High Density Density Multilayer Server Server Load Balancing Load Balancing Multilayer LAN LAN Switch Switch Intrusion Intrusion Detection Detection

Front-End Application Front-End Application Servers Servers

DWDM/ CWDM Back-End Application Back-End Application Servers Servers

High High Density Density Multilayer Multilayer SAN SAN Director Director

Enterprise-Class Storage Arrays Enterprise-Class storage Arrays IP/Metro E

Increasing Distance Data Center Campus Metro Regional National

Dark Fiber Sync Limited by Optics (Power Budget)

CWDM Sync (2Gbps) Limited by Optics (Power Budget) al cc DWDM Sync (2Gbps lambda) Limited by BB_Credits Opti SONET/SDH Sync (1Gbps+ subrate) Async

IP MDS9000 FCIP Sync (Metro Eth) Async (1Gbps+)

Share d Data Extend the normal reach of Cluster or Remote Host a Fibre Channel fabric Access to Storage Replication Remote host to target array Shared data clusters SAN Extension Network

FC FC Replication

Server Site A Access Servers with two fibre channel connections to FC Replication Fabrics storage arrays for high availability Use of multipath software is required in dual fabric host design DC Interconnect Network SAN extension fabrics typically separate from host access fabrics Replication fabric

FC Replication requirements generally fabrics specified by array vendor Site B

Global

Regional < 400km

Secondary Primary DR Site Data Center DtData C en ter

Metro < 50km

Disasters are characterized by Local their impact 1–2 km Local, metro, regional, global Fire, flood, earthquake, attack Is the backup site within the threat radius?

CA CA NC High Availability Site 1 High Availability Site 2 Disaster Recovery Site

Hosts 1 Hosts 2 Hosts 3

HA Cluster(s) Electronic Journaling

Synch CWDM MDS 9509’s Replication MDS 9509’s MDS 9509’s

Dual OC12

Synch FCIP Asynchronous Replication FCIP Replication

MDS 9509 MDS 9509 MDS 9509 Gateway Gateway Gateway

Storage 1 Storage 2Bunker Storage 3

MDS DUAL OC12 MDS

SRDF R2 BCV/R1 BCV Timefinder Timefinder PiT SRDF/A PROD D/R PiT SRDF/A PiT Redo SRDF/A Redo PiT Arch Arch Triple Threat EMC/DMX EMC/DMX EMC/DMX

Service Locator Group Data Centers ACE ACNS ACE User decrypts caches routes request pages request Clustered DC2 Backend YActiveY Active StandbyActive Active X Standby Data X Data Y Content Engine GSS performs Site (DC) selection according to pre-configured condition, using ACE Requests FQDN probes directed to tktrack bkbackup application application health

Mirror Presentation Layer Asynchronous Replication DC1 Requests directed to primary application

Clustered Active Standby Data X Active Backend Data Y X Active Y Standby

• SANTap Production Servers • Appliance based storage replication • Reliable copy of WRITE operations • SCSI-FCIP communication

• Continuous Data Protection • Automatic and Continuous Backups CDP • Time Addressable Storage (TAS) Appliance • Any Point-in-Time Recovery SAN Tap • Application based or Network based MDS SAN

Primary Secondary

Replication/CDP Replication/CDP Appliance Appliance SANTap DUAL OC12

MDS MDS

D/R APiT APiT APiT SRDF/A PROD BCV SRDF/A Redo Redo APiT APiT SRDF/A APiT Arch Arch

EMC/DMX TAS/SATA TAS/SATA EMC/DMX

Corp. DNS

GSS-1 GSS-2

ACE-1 ACE-2 ACE-3

DC-1 DC-2 DC-3

Web/APP Server Farm

DB IP/Optical Network CWDM/DWDM

FC FC FC Primary Secondary Location Location

Data centers 1 and 2 are in primary location with close enough distance that can provide DC HA for active/active access Data Center 3 (DR) with > tolerable disaster radius, away for Primary DC 1 and 2 Web/App server farms are load balanced geographically DB servers are within a geo-HA cluster and running in a L3 design Syypnchronize Data replication between data centers within the primary location Asynchronous Data replication is done between the primary and secondary storage systems