Data Center Business Continuance and Disaster Recovery
Maciej Bocian [email protected] Architecture Sales Manager Data Center and Virtualization, Central Europe CCIE#7785
Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 1 Business Continuance Drivers
• Cost of application downtime, lost data and productivity
• Reggyulatory mandates (Homeland Hurricanes Defense, Basel II, HIPAA, GLB, SEC) Firms must recover business operations the same business day a disruption occurs “Out-of-region” data center, 200+ km away Mandates backup data centers on separate The Northeast Blackout grids
NYC Blizzard of 2003
Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 2 Business Continuance Is More Critical than Ever 75% of IT decision-makers have altered Disaster Recovery/Business Continuance programs as a result of September 11
Following a disaster 43% of directly affected businesses do not reopen and 29% fail within 24 months as a result
Only 15% of Global 2000 enterprises have a full- fledged business continuity plan.
Disasters: fire, storm, floods, earthquakes, chemical accidents, nuclear accidents, wars
Sources: Disaster Recovery Journal, Gartner Group
Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 3 Agenda
Introduction to Data Center - The Evolution Data Center Disaster Recovery Objectives Failure Scenarios Design Options Components of Disaster Recovery Site Selection - Front End GSLB Server High Availability - Clustering DtRlitiData Replication an dShd Synchron itiization - SAN Ex tensi on Sample Design
Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 4 The Evolution of Data Centers
Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 5 Data Center Evolution
NETWORKED DATA CENTER PHASE Data Center Continuous Data Center Availability Distributed Data Center Network Consolidation Optimization COMPUTE Internet EVOLUTION Computing Data Center Client/ Networking
Agility Server 1. Consolidation Mainframes 2. Integration Content 3. Distributed Networking 4. High Availability
Business Business Thin Client: HTTP
TCP/IP NETWORK Terminal EVOLUTION
1960 1980 2000 2010
Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 6 What is involved in a Data Center
Network infrastructure solution Application solution Cisco GSRs, Linux/HP , CISCO CATALYST Solaris/SunFire, 6500, Cisco Catalyst WebLogic, J2EE Cat4000 custom app, etc.
Layer 4–7 services solution CSM, Database solution SSLM, Linux/HP, Solaris/ CSS, SunFire, Oracle CE, GSS 10G RAC, etc.
Network security solution PIX®, FWSM, IDSM, VPNSM, Storage so lu tion CSA MDS9000
Management and instrumentation solution Terminal servers, NAM, Cisco Works LMS/VMS, HSE
Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 7 What is Distributed Data Center
APP A APP B APP A APP C
Data Replication
FC FC Primary Secondary Data Center Data Center
Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 8 Why Distributed Data Centers
Provide disaster recovery and business continuance Avoid single, concentrated data depositary Higgyh availability of applications and data access Load balancing together with performance scalability Better response and optimal content routing: proximity to clients
Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 9 Front-end IP Access Layer
“Content Routing”
APP A APP B site selection APP A APP C
FC FC Primary Secondary Data Center Data Center
Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 10 Application and Database Lay er
“Content Switching ” Load Balancing APP A APP B “Server Clustering” APP A APP C High Availability
FC FC Primary Secondary Data Center Data Center
Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 11 Backend SAN Extension
“Storage” & “Optical” APP A APP B Data APP A APP C Mirroogadepcatoring and Replication
FC FC
PiPrimary SdSecondary Data Center Data Center
Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 12 Data Center Disaster Recovery
Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 13 Agenda
Introduction to Data Center - The Evolution Data Center Disaster Recovery Objectives Failure Scenarios Design Options Components of Disaster Recovery Site Selection - Front End GSLB Server High Availability - Clustering DtRlitiData Replication an dShd Synchron itiization - SAN Ex tensi on Sample Design
Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 14 Disaster Recovery
Recovery of data and resumption of service - Ensuring business can recover and continue after failure or disaster
Ability of a business to adapt, change and continue when confronted with various outside impacts
Mitigating the impact of a disaster
Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 15 What It means For Business
Business Resilience Continued Operation of Business During a Failure Business Continuance Restoration of Business After a Failure Disaster Recovery Protectinggg Data Through Offsite Data Replication and Backup
Zero Down Time is the ultimate goal
Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 16 Disaster Recovery Planning
• Business Impact Analysis (BIA) Determines the impacts of various disasters to specific business functions and company assets
• Risk Analysis Identifies important functions and assets that are critical to company’ s operations
• Disaster Recovery Plan (DRP) Restores operability of the target systems, applications, or computing facility at the secondary Data Center after the disaster
Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 17 Disaster Recovery Objectives
Recovery Point Objective (RPO) The po in t in time (pr ior to the ou tage) i n w hic h sys tem an d da ta must be restored to Tolerable lost of data in event of disaster or failure The impact of data loss and the cost associated with the loss Recovery Time Objective (RTO) The period of time after an outage in which the systems and data must be restored to the predetermined RPO The maximum tolerable outage time Recovery Access Objec tive (RAO) Time required to reconnect user to the recovered application, regardless where it is recovered
Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 18 Recovery Point/Time vs. Cost
Critical data is Disaster Systems recovered recovered strikes and operational
time Recovery point Recovery time time t time t 0 1 time t2
days hours mins secs secs mins hours days weeks
Tape Periodic Asynchronous Synchronous Extended Manual Tape backup Replication Replication Replication Cluster Migration Restore
$$$ Increasing cost $$$ Increasing cost
Smaller RPO/RTO Larger RPO/RTO Higher $$$, Replication, Hot Lower $$$, Tape backup/restore, standby Cold stanby
Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 19 Agenda
Introduction to Data Center - The Evolution Data Center Disaster Recovery Objectives Failure Scenarios Design Options Components of Disaster Recovery Site Selection - Front End GSLB Server High Availability - Clustering DtRlitiData Replication an dShd Synchron itiization - SAN Ex tensi on Sample Design
Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 20 Failure Scenarios
Disaster could mean many types of Failure Network Failure DiDevice FilFailure Storage Failure Site Failure
Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 21 Network Failures
Internet Service Service PidAProvider A Provider B
ISP failure 9 Dual ISP connections 9 Multiple ISP
Connection failure within the network 9 ether-channel 9 Multiple route paths
Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 22 Device Failures
Internet Service Service Provider A Provider B
Routers, Switches, FWs 9 HSRP 9 VRRP
Hosts 9 HA cluster
Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 23 Storage Failures
Internet Service Service PidAProvider A Provider B
Disk arrays 9 RAID
Disk Controllers
Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 24 Site Failures
Internet Service Service PidAProvider A Provider B
Partial Site Failure 9 Application maintenance 9 Application migration 9 Application scheduled DR exercise
Complete Site Failure 9 Disaster
Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 25 Agenda
Introduction to Data Center - The Evolution Data Center Disaster Recovery Objectives Failure Scenarios Design Options Components of Disaster Recovery Site Selection - Front End GSLB Server High Availability - Clustering DtRlitiData Replication an dShd Synchron itiization - SAN Ex tensi on Sample Design
Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 26 Cold Standby
One or more data center with appropriately configured space equipped with pre-qualified environmental, electrical,,g and communication conditioning Hardware and Software installation, Network access, and data restoration all need manual intervention Least expensive to implement and maintain Substantial delay from standbyyp to full operation
Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 27 Disaster Recovery – Active/Standby
APP A APP B APP A APP B
FC FC Primary Secondary Data Center Data Center (Cold Standby)
Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 28 Warm Standby
A data center that is partially equipped with hardware and communications interfaces capable of providing backup oppgpperating support. Latest backups from the production data center must be delivered Network access needs to be activated Provides better RTO and RPO than Cold Standby Backup
Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 29 Disaster Recovery – Active/Standby
APP A APP B APP A APP B
IP/Optical Network
FC Secondary FC Primary Data Center Data Center (Warm Standby)
Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 30 Hot Standby
A data center that is environmentally ready and has sufficient hardware, software to provide data processing service with little down or no down time. Hot Backup offers Disaster Recovery, with little or no human intervention Appli ca tion da ta is rep lica te d from the pr imary s ite A hot backup site provides very good RTO and RPO
Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 31 Disaster Recovery – Active/Standby
APP A APP B APP A APP C
IP/Optical Network
FC FC Primary Secondary Data Center Data Center
Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 32 Disaster Recovery – Active/Active
What Does Active/Active Mean??
Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 33 Multiple Tiers of Application
Internet Service Service PidAProvider A Provider B
Presentation Tier
Application Tier
Storage Tier
Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 34 Active/Active Data Centers
Internal Internet Network Service Service Provider A Provider B Internal Network
Active/Active Web Hosting
Active/Active Application Processing
Active/Standby Database Processing Or Active/Active
Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 35 Disaster Recovery Components
Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 36 Agenda
Introduction to Data Center - The Evolution Data Center Disaster Recovery Objectives Failure Scenarios Design Options Components of Disaster Recovery Site Selection - Front End GSLB Server High Availability - Clustering DtRlitiData Replication an dShd Synchron itiization - SAN Ex tensi on Sample Design
Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 37 Site Selection Mechanisms
Site selection mechanisms depend on the technology or mix of technologies adopted for request routing: 1. HTTP Redirect 2. DNS Based 3. L3 Routing with Route Health Injection (RHI) HlthfHealth of servers an d/or app litilications needtbds to be taken into account Optionally, other metrics (like load ) can be measured and utilized for a better selection
Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 38 HTTP Redirection – The Idea
Leveraging the HTTP redirect function: HTTP return code 302 Proper site selection made after the initial DNS request has been resolved, via redirection Mainly as a method of providing site persistence while providing local server farm failure recovery Can be used with the “Location Cookie” feature of the CSS to provide redirection after wrong site selection
Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 39 HTTP Redirection – Traffic Flow
http://www.cisco.com/
http://www1.cisco.com/
http://www2.cisco.com/
Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 40 Advantages of the HTTP Redirection Approach
Can be implemented without any other GSLB devices or mechanisms Inherent persistence to the selected location Can be used in conjunction with other methods to provide more sophisticated site selection
Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 41 Limitations of the HTTP Redirection Approach
It is protocol specific – relies on HTTP Reqqyquires redirection to fully qualified additional names – additional DNS records UbkkifiliUsers may bookmark a specific location – losing automatic failover HTTPS redirect requires full SSL hand shake to be completed first
Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 42 DNS-Based Site Selection – The Idea
The client D-proxy (local name server) performs iterative queries The device which acts as “site selector” is the authoritative name server for the domain(s) distributed in multiple locations The “site selector” sends keepalives to servers or server ldblload balancer ithlin the loca l an d remo tlte loca tions The “site selector” selects a site for the name resolution, according to the pre-defined answers and site load balance method The user traffic is sent to the selected location
Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 43 DNS-Based Site Selection – Traffic Flow
Root Name Server for/ Authoritative Name Server for .com DNS Proxy 2 3 4 Authoritative Name Server cisco.com 5 1 6 10 7 8 Client 9 Authoritative Name Server http://www.cisco.com/ www.cisco.com UDP:53 TCP:80
Data Center 1 Data Center 2
Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 44 Advantages of the DNS Approach
Protocol indeppyendent: works with any application that uses name resolution Minimal configuration changes in the current IP and DNS infrastructure (DNS authoritative server) Implementation can be different for specific host names A-records can be changed on the fly Can take load or data center size into account Can provide proximity
Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 45 Limitations of the DNS -Based Approach
Visibility limited to the D-proxy (not the client) Can not guarantee 100% session persistency DNS caching in the D-proxy DNS caching in the client application Order of multiple A-record answers can be altered by D-proxies
Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 46 Route Health Injection – The Idea
Server and application health monitoring provided by local Server Load Balancers SLB can advertise or with draw VIP address to upstream routing devices depending on the availability of the local server farm Same VIP a ddresses can b e ad ver tise d from mu ltip le data centers – IP Anycast Relying on L3 routing protocols for route propagating and content request routing Disaster Recoveryyp provided b y network convergence
Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 47 Route Health Injection – Implementation
Client A Router 11 Client B Router 13
Router 10
Router 12 Low Cost Very High Cost Location A Backup Location for Location B VIP x.y.w.z Preferred Location for VIP x.y.w.z
Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 48 Advantages of the RHI Approach
Supports legacy application and does not rely on a DNS infrastructure Veryyg good re-convergence time, especially in Intranets where L3 protocols can be fine tuned appropriately PtProtocol -idindepen den t: wor ks w ith any application Robust protocols and proven features
Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 49 Limitations of the RHI Approach
Relies on host routes (32 bits) , which cannot be propagated all over the internet (more on this later) Requires tight integration between the application-aware devices and the L3 routers Inability to intelligently load balance among the data centers
Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 50 Agenda
Introduction to Data Center - The Evolution Data Center Disaster Recovery Objectives Failure Scenarios Design Options Components of Disaster Recovery Site Selection - Front End GSLB Server High Availability - Clustering DtRlitiData Replication an dShd Synchron itiization - SAN Ex tensi on Sample Design
Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 51 Cluster Overview
A cluster is two or more servers configured to appear as one Two types of clustering: Load balancing (LB) and High
Availability (HA) Web Servers Clustering provides benefits for availability, reliability, scalability, and manageability LB cl ust eri ng: multi p le cop ies o f Application Servers the same application against the same data set, usually read only HA clustering: multiple copies of long running application that Database Servers requires access to a common data depository, usually read and write
Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 52 HA Cluster Connections
Public Network (typically Ethernet)Ethernet) forfor cclientlient //ApplicationApplication requests Servers with same hardware, OS, and application software Private Network (typically Ethernet) for interconnection between nodes. Could be direct connecttillit, or optionally going through the public network Storage Disk (typically Fiber) shared storage array, NAS or SAN
Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 53 Typical HA Cluster Components
Application software that are clustered to ppgrovide High Availability. Example: Microsoft Exchange, SQL, Oracle database, File and Print Services Operating System that runs on the server hardware. Examp le: Microsoft Win dows 2000 or 2003, Linux ( and th e other flavors of UNIX), IBM VMS or z/OS (for mainframe) Cluster Software that provides the HA clustering service for the application. Example: Microsoft MSCS, EMC AutoStart (Legato), Veritas Cluster Server, HP TruCluster and OpenVMS Optionally, Cluster Enabler, a software that synchronizes the cluster software with the storage disk array software
Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 54 Basic HA Cluster Design
Active/Standby: – Active node takes client requests and writing to the data – Standby takes over when detecting failure on active – Two-node or multi-node node1 node2 Active/Active: – Database requests load balanced to both nodes – Lock mechanism ensures data integrity – Most scalable design
Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 55 File System Approaches for HA Clusters
Shared Everyygthing – Equal access to all storage – Each node mounts all storage resources – Provides a single layout reference system for all nodes – Changes updated in the layout reference Shared Nothing – Traditional file system with peer-peer communication – Each node mounts only its “semi-private” storage – Data stored on the ppyeer system’s stora ge is accessed via the peer- peer communication – Failed node’s storage needs to be mounted by the peer
Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 56 Geo-clusters
Geo-cluster: cluster that span multiple data centers
WAN
Local Remote Datacenter Datacenter
node1 node2
Disk Replication Synchronous or Asynchronous 2 x RTT
Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 57 Considerations for HA Clusters
Split Brain: Cluster partitioning when nodes can not communicate with each other but are equally capable of forming a cluster and mount disks. Extended L2 required in most implementations for: – Public Network, since client only knows about the Virtual IP address – Private Network, used for Heart-beats Storage: – Directly Attached Disk (DAS) cannot be used – Shared Disk needs to be visible to both Nodes – Needs to interface with cluster software for disk failover, zoning, LUN masking when there is a node failure
Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 58 Split-Brain
Split-brain happens when all of the network communication links between two or more cluster nodes fail. Both nodes could potentially go active, and concurrently access the node1 node2 di sk , thus co rr upting data
Data Corruption
Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 59 Resolution for Split Brain: Quorum
A quorum device serves as a tie breaker to arbitrate which system has access to resources. The quorum ensures that even if there is no communication between the nodes, only one node can continue to node1 node2 access the disk. Only the node that owns the quorum (or, majority quorum votes) can bring resources online. Any resource can be used as the arbitrator to break the tie.
quorum
Application data
Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 60 Extended Layer 2 Network
In most implementation, WAN a common L2 ne twor k is needed for the heartbeat Local Remote between the nodes, as Datacenter Datacenter well as public client access Public Layer 2 network Extending VLAN on a geographical basis is not node1 node2 considere d bes t prac tice Private Layer 2 network because of the impact of broadcasts, multicast, floodinggg and Spanning- Tree integration issues
Disk Replication: Synchronous or Asynchronous
Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 61 Resolution: L3 Routed Solution
In certain cases a L3 routed solution is possible 11. 20. 5. x 172.28.210.x Microsoft MSCS – Requires that 2 nodes be on the node2 same subnet. node1 – The commun ica tion be tween the 2 nodes is UDP unicast – Local Area Mobility (LAM) allows the placement of the nodes on 2 different subnets Veritas VCS Extended SAN – Allows having nodes with IP addresses in different subnets – The Virtual Address needs to change when moving from node1 to node2 – DNS can be used to provide name- multiple IP mapping Disk Replication: Synchronous or Asynchronous
Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 62 Storage Disk Zoning
node1 node2 What storaggye disk array should node 2 be zoned to active standby before and after a failure on node 1 To complete the failover you need to change the zoning Extended SAN configuration Software needed to synchronize the Cluster Software with the Disk Array’s software, i.e. Cluster Enabler sym1320 sym1291
RW RD
RW RD
Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 63 Resolution: Cluster Enabler
The Cluster Enabler (CE) provides node1 node2 the interface between the Clustering Software and the Disk standby Array’s software active When the Clustering Software detects a failure and wants to fail the node, the Cluster Enabler instructs the Disk Array to perform an failover Extended SAN Cluster Enabler also allows node1 to be zoned to sym1320 and node2 to be zoned to 1291 The Cluster Enabler running on each node typically communicates sym1320 sym1291 with the Cluster Enabler Software running on the remote node with Local Multicast messages RW WD
RW WD
Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 64 Agenda
Introduction to Data Center - The Evolution Data Center Disaster Recovery Objectives Failure Scenarios Design Options Components of Disaster Recovery Site Selection - Front End GSLB Server High Availability - Clustering DtRlitiData Replication an dShd Synchron itiization - SAN Ex tensi on Sample Design
Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 65 Terminology
Storage subsystem Just a bunch of disks (JBOD) Redundant array of independent disks (RAID) Storage I/O devices Host Bus Adapter (HBA) Small Compp()uter Serial Interface (SCSI) Storage protocols SCSI iSCSI FC (FCIP)
Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 66 Terminology (Cont ’d)
Direct Attached Storage (DAS) Storage i s “loca l” bhidthbehind the server No storage sharing possible Costly to scale; complex to manage Network Attached Storage (NAS) Storage is accessed at a file level over an IP network Storage can b e sh are d be tween servers Storage Area Networks (SAN) Storage is accessed at a block-level Separation of Storage from the Server High performance interconnect providing high I/O throughput
Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 67 Storage for Applications Presentation Tier Unrelated small data files commonly stored on internal disks Manual distribution Application Processing Tier Transitional, unrelated data Small files residing on file systems Mayyp use RAID to spread data over multi ple disks Storage Tier Large, permanent data files or raw data Large batch updates, most likely Real time Log and data on separate volumes
Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 68 Backup and Replication
Offsite tape vaulting Backup tapes stored at offsite location Electronic vaulting Transmission of backup data to offsite location Remote disk replication Continuous copying of data to offsite location Transparent to host Other methods of replication Host-based mirroring Network-based replication
Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 69 Replication: Modes of Operation
Synchronous All data written to cache of local and remote arrays before I/O is complete and acknowledged to host Asynchronous Write acknowledged after write to local array cache; changes (writes) are replicated to remote array asynchronously Semi-synchronous Write acknowledggged with a single subsequent WRITE command pending from remote array
Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 70 Synchronous Vs. Asynchronous Trade- Off
Synchronous Asynchronous Impact to Application No Application Performance Performance Impact Distance Limited (Are Both Unlimited Distance (Second Sites within the Same Site Outside Threat Radius) Threat Radius) Exposure to No Data Loss Possible Data Loss
Enterprises Must Evaluate the Trade-Offs
Maximum tolerable distance ascertained by assessing each application Cost of data loss
Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 71 Data Replication with DB Example
Control Files Control Files identify other files • DB name making up the database and • creation date records content and state of • backup performed the db. • redo log time period • datafile state Datafile is onlyyp updated periodically Redo logs record db changes Identify resulting from transactions UdtlUsed to play bkhback changes thtthat may not have been written to datafile when failure occurred Typically archived as they fill to local and DR site destinations Record Datafiles changes to Redo Log Files
• Tablespaces • Database changes • Indexes • Data Dictionary
Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 72 Data Replication with DB Example (Cont’d)
Failure or disaster occurs at time t1 • Media Failure (e .g . disk) time • Human Error (datafile deletion) • Database Corruption
......
Archived Redo Logs t1 t Online Redo 0 Logs
Database restored to state at time of failure (time t1) Hot Backup of Datafiles and by: Control Files taken at Time t0 1. Restoring Control Files & Datafiles from last Hot Backup (time t0) 2. Sequentially replaying changes from subsequent Redo Logs (archived and online) – changes made between time t0 and t1
Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 73 Data Replication with DB Example (Cont’d)
Primary Site Secondary Site Redo Logs (Cyclic) Redo Logs (Cyclic) Copy of Every Committed Transaction Synchronously Replicated Earlier DB for Zero Loss Backups
Database
SAN EtExtensi on Database Database Transport Copy at copy at Time t0 Point in Time time t0 Copy Taken When DB Replicated/Copied Quiescent
Archive Logs Archive Logs Replicated/Copied
Mixture of sync and async replication technologies commonly used Usually only redo logs sync replicated to remote site Archive logggpgs created from redo log and copied when redo log switches Point in time (PiT) copies of datafiles and control files copied periodically (e.g. nightly)
Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 74 Data Center Interconnection Options
Internet Internet Stateful Stateful Firewalls Firewalls Content CttContent Caching Caching SONET/SDH High High Density Density Multilayer Server Server Load Balancing Load Balancing Multilayer LAN LAN Switch Switch Intrusion Intrusion Detection Detection
Front-End Application Front-End Application Servers Servers
DWDM/ CWDM Back-End Application Back-End Application Servers Servers
High High Density Density Multilayer Multilayer SAN SAN Director Director
Enterprise-Class Storage Arrays Enterprise-Class storage Arrays IP/Metro E
Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 75 Data Center Transport Options
Increasing Distance Data Center Campus Metro Regional National
Dark Fiber Sync Limited by Optics (Power Budget)
CWDM Sync (2Gbps) Limited by Optics (Power Budget) al cc DWDM Sync (2Gbps lambda) Limited by BB_Credits Opti SONET/SDH Sync (1Gbps+ subrate) Async
IP MDS9000 FCIP Sync (Metro Eth) Async (1Gbps+)
Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 76 Data Center Replication with SAN Extension
Share d Data Extend the normal reach of Cluster or Remote Host a Fibre Channel fabric Access to Storage Replication Remote host to target array Shared data clusters SAN Extension Network
FC FC Replication
Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 77 SAN Design for Data Replication
Server Site A Access Servers with two fibre channel connections to FC Replication Fabrics storage arrays for high availability Use of multipath software is required in dual fabric host design DC Interconnect Network SAN extension fabrics typically separate from host access fabrics Replication fabric
FC Replication requirements generally fabrics specified by array vendor Site B
Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 78 Data Center Disaster Recovery sample design
Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 79 Disaster Impact Radius
Global
Regional < 400km
Secondary Primary DR Site Data Center DtData C en ter
Metro < 50km
Disasters are characterized by Local their impact 1–2 km Local, metro, regional, global Fire, flood, earthquake, attack Is the backup site within the threat radius?
Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 80 Active/Standby Architecture - Today
CA CA NC High Availability Site 1 High Availability Site 2 Disaster Recovery Site
Hosts 1 Hosts 2 Hosts 3
HA Cluster(s) Electronic Journaling
Synch CWDM MDS 9509’s Replication MDS 9509’s MDS 9509’s
Dual OC12
Synch FCIP Asynchronous Replication FCIP Replication
MDS 9509 MDS 9509 MDS 9509 Gateway Gateway Gateway
Storage 1 Storage 2Bunker Storage 3
Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 81 Frame Based Replication Data Center 1 Data Center 2 Production Cluster D/R
MDS DUAL OC12 MDS
SRDF R2 BCV/R1 BCV Timefinder Timefinder PiT SRDF/A PROD D/R PiT SRDF/A PiT Redo SRDF/A Redo PiT Arch Arch Triple Threat EMC/DMX EMC/DMX EMC/DMX
Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 82 Active/Active Architecture - Tomorrow
Service Locator Group Data Centers ACE ACNS ACE User decrypts caches routes request pages request Clustered DC2 Backend YActiveY Active StandbyActive Active X Standby Data X Data Y Content Engine GSS performs Site (DC) selection according to pre-configured condition, using ACE Requests FQDN probes directed to tktrack bkbackup application application health
Mirror Presentation Layer Asynchronous Replication DC1 Requests directed to primary application
Clustered Active Standby Data X Active Backend Data Y X Active Y Standby
Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 83 SANTap and Continuous Data Protection
• SANTap Production Servers • Appliance based storage replication • Reliable copy of WRITE operations • SCSI-FCIP communication
• Continuous Data Protection • Automatic and Continuous Backups CDP • Time Addressable Storage (TAS) Appliance • Any Point-in-Time Recovery SAN Tap • Application based or Network based MDS SAN
Primary Secondary
Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 84 Fabric Based Replication with CDP Data Center 1 Data Center 2 Production Cluster D/R
Replication/CDP Replication/CDP Appliance Appliance SANTap DUAL OC12
MDS MDS
D/R APiT APiT APiT SRDF/A PROD BCV SRDF/A Redo Redo APiT APiT SRDF/A APiT Arch Arch
EMC/DMX TAS/SATA TAS/SATA EMC/DMX
Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 85 End-End Data Center Resilience
Corp. DNS
GSS-1 GSS-2
ACE-1 ACE-2 ACE-3
DC-1 DC-2 DC-3
Web/APP Server Farm
DB IP/Optical Network CWDM/DWDM
FC FC FC Primary Secondary Location Location
Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 86 Summary - Design Details
Data centers 1 and 2 are in primary location with close enough distance that can provide DC HA for active/active access Data Center 3 (DR) with > tolerable disaster radius, away for Primary DC 1 and 2 Web/App server farms are load balanced geographically DB servers are within a geo-HA cluster and running in a L3 design Syypnchronize Data replication between data centers within the primary location Asynchronous Data replication is done between the primary and secondary storage systems
Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 87 Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 88