Scale and Availability Considerations for Cluster File Systems David Noy, Symantec Corporation SNIA Legal Notice
The material contained in this tutorial is copyrighted by the SNIA unless otherwise noted. Member companies and individual members may use this material in presentations and literature under the following conditions: Any slide or slides used must be reproduced in their entirety without modification The SNIA must be acknowledged as the source of any material used in the body of any document containing material from these presentations. This presentation is a project of the SNIA Education Committee. Neither the author nor the presenter is an attorney and nothing in this presentation is intended to be, or should be construed as legal advice or an opinion of counsel. If you need legal advice or a legal opinion please contact your attorney. The information presented herein represents the author's personal opinion and current understanding of the relevant issues involved. The author, the presenter, and the SNIA do not assume any responsibility or liability for damages arising out of any reliance on or use of this information. NO WARRANTIES, EXPRESS OR IMPLIED. USE AT YOUR OWN RISK.
Scale and Availability Considerations for Cluster File Systems © 2011 Storage Networking Industry Association. All Rights Reserved. 2 Abstract
Scale and Availability Considerations for Cluster File Systems This session will appeal to server administrators looking to improve the availability of their mission critical applications using a heterogeneous tool that is cross platform and low cost. We will learn how you can use Cluster File System to improve performance and availability for your application environment.
Scale and Availability Considerations for Cluster File Systems © 2011 Storage Networking Industry Association. All Rights Reserved. 3 Simultaneous Access To File System
Without a Clustered File System
A traditional file system can only be mounted on
File System 1 File System 2 File System 3 File System 4 one server at any given time, otherwise data corruption will occur.
With a Clustered File System
With a cluster file system, all servers that
Veritas Cluster File System HA are part of the cluster can safely access the file system simultaneously
Scale and Availability Considerations for Cluster File Systems © 2011 Storage Networking Industry Association. All Rights Reserved. How is a CFS different from NAS?
Network Attached Storage (NAS) Uses TCP/IP to share a file system over NFS the Local Area Network Higher latency and overhead A file system is mounted via a network based file system protocol (CIFS on Windows, NFS on Unix)
Cluster File System Looks and feels like a local file system, but shared across multiple nodes Uses a Storage Area Network to share the data, And dedicated network interfaces to share locking information Tightly coupled with clustering to create redundancy With a Cluster File System the application can run on the same node as the file system to get the best performance
Scale and Availability Considerations for Cluster File Systems © 2011 Storage Networking Industry Association. All Rights Reserved. Typical Performance Decay at Scale
Server Capacity
Available Capacity COST PERFORMANCE
Bottlenecks as a result of legacy technology Performance per node can degrade with more nodes Linear scalability generally infeasible (results vary based on app)
Scale and Availability Considerations for Cluster File Systems © 2011 Storage Networking Industry Association. All Rights Reserved. 6 Expected Scalability for a CFS
Throughput for CFS for 1 – 16 nodes 2500
2000
1500 Total MB/Sec to 1000 CFS
500
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 # of Nodes
Scale and Availability Considerations for Cluster File Systems © 2011 Storage Networking Industry Association. All Rights Reserved. 7 Legacy Meta-Data Management
File system transactions done primarily by one master node per file system
That node becomes the Metadata/Data bottleneck in transaction intensive workloads Data The amount of transactions performed locally can be CFSData improved but it is still the bottleneck Data Acceptable performance for some workloads, not for others
Scale and Availability Considerations for Cluster File Systems © 2011 Storage Networking Industry Association. All Rights Reserved. Distributed Meta-Data Management
All nodes in the cluster can Metadata / Data perform transactions – No need to manually balance the load Not directly visible to end users Performance tests show linear SAN scalability Metadata / Data Scale-out becomes complex beyond
64 nodes CFS
Metadata / Data Log per node Metadata / Data
Servers
Scale and Availability Considerations for Cluster File Systems © 2011 Storage Networking Industry Association. All Rights Reserved. Without Range Locking: Exclusive locking
Node 1 – Writer Node 2 – Writer with an waiting for lock exclusive lock to be released
File Appending writes, e.g. a log file foo.ba r
Lock held by node 1, other nodes do not have access until the lock is released
Scale and Availability Considerations for Cluster File Systems © 2011 Storage Networking Industry Association. All Rights Reserved. Range Locking
Node 1 - Writer Node 2 - Writer
Appending writes, e.g. data ingest
File foo.bar
Available for read or Lock held by node 1, other write by any nodes nodes do not have access
Scale and Availability Considerations for Cluster File Systems © 2011 Storage Networking Industry Association. All Rights Reserved. Other Scalability Considerations
Distributed Meta-Data Management Distributed Lock Management Minimize intra-cluster messaging Cache optimizations Reduce or eliminate fragementation
Scale and Availability Considerations for Cluster File Systems © 2011 Storage Networking Industry Association. All Rights Reserved. Availability : I/O Fencing
Node A Node B Coordinator Disks
Data Disks
Scale and Availability Considerations for Cluster File Systems © 2011 Storage Networking Industry Association. All Rights Reserved. 13 I/O Fencing : Split Brain…
Node A Node B Coordinator Disks
Data Disks
Scale and Availability Considerations for Cluster File Systems © 2011 Storage Networking Industry Association. All Rights Reserved. 14 I/O Fencing : Split Brain Resolved
Y N
Node A Node B Coordinator Disks
Data Disks
Scale and Availability Considerations for Cluster File Systems © 2011 Storage Networking Industry Association. All Rights Reserved. 15 Cluster Fencing
Why Fencing? SCSI-3 Based Fencing Proxy Based Fencing
APP APP APP APP
Data Corruption
SCSI3 PR Disks IP Based Proxy Servers • Need to restrict writes to • SCSI3 disks for i/o fencing • Non SCSI3 fencing current and verified nodes • Maximum data protection • Virtualized environment
Scale and Availability Considerations for Cluster File Systems © 2011 Storage Networking Industry Association. All Rights Reserved. 16 Other Availability Considerations
Robust fencing mechanism Tight integration between application clustering and storage layer File System Volume Manager Multi-Pathing Quick recovery of CFS objects Lock Recovery Meta-Data Redistribution
Scale and Availability Considerations for Cluster File Systems © 2011 Storage Networking Industry Association. All Rights Reserved. Cluster File System Use Cases
Faster Performance for Improve Storage Failovers Parallel Apps Utilization
Databases ETL applications
Messaging Grid Applications applications
CRM applications Parallel databases
• Dramatically reduce • Linear scalability and • Pool storage between application failover performance for servers eliminating times parallel applications storage islands • When appropriate, • Ideal for grid compute • Eliminate multiple eliminate the cost and copies of data complexity of parallel databases
Scale and Availability Considerations for Cluster File Systems © 2011 Storage Networking Industry Association. All Rights Reserved. Accelerate Application Recovery
For Applications that Need Maximum Uptime Classic Clustering Requires Time
Databases Client Recovery Steps
• Detect failure Messaging applications • Un-mount file system • Deport disk group Classic Clustering Classic Clustering CRM applications • Import disk group • Mount file system File System File System • Start application • Clients reconnect ActiveFailed PassiveActive Server ServerServer
Scale and Availability Considerations for Cluster File Systems © 2011 Storage Networking Industry Association. All Rights Reserved. Accelerate Application Recovery
For Applications that Need Maximum Uptime Failover as Fast as Application Restart
Client Databases Recovery Steps
• Detect failure Messaging • Un-mount file applications system Veritas Cluster Veritas Cluster • Deport disk group Server Server • Import disk group
CRM applications Veritas Storage Veritas Storage • Mount file system Veritas Cluster File System Foundation Foundation • Start application • Clients reconnect Failed Active Server Server
Scale and Availability Considerations for Cluster File Systems © 2011 Storage Networking Industry Association. All Rights Reserved. Case Study : Reduced Downtime Cost
Before: Ground Control System Using CRM and DataBases Traditional High Availability • Downtime cost = €1,000,000/plane/day • # Failovers/yr = 6 Failures @ 30 mins each SAP and Oracle with HA • # Planes = 240 • Total Downtime/Year = 3 hours • @ 1,000,000/plane/day x 3 hrs x 240 planes
6x 30 Minute Failovers / Year • Downtime Cost = € 30,000,000
After: Ground Control System Fast Failover with CFS Fast Failover with CFS HA • Downtime cost = €1,000,000/plane/day
Prod SG Prod SG • # Failovers/yr = 6 Failures @ 1 min each • # Planes = 240 • Total Downtime/Year = 6 minutes • @ 1,000,000/plane/day x 6 min x 240 planes
< 1 Minute Downtime / Failure • Downtime Cost = € 1,000,000
Scale and Availability Considerations for Cluster File Systems © 2011 Storage Networking Industry Association. All Rights Reserved. Case Study : Large Casino
Messaging Services FT Cluster
Messaging server
Client • Using messaging services to run entire casino gaming floor • Experienced server outage • Storage failover was very quick • Failover server required 20 minutes to recover to rebuild message queue Messaging Server Messaging Server
Veritas Storage Veritas Storage • Entire casino floor came to a complete stop CFS Foundation Foundation
• Failed Active Total Savings Server Server • With CFS, a hot-standby server could recover in seconds instead of 20 minutes
Shared SAN
Scale and Availability Considerations for Cluster File Systems © 2011 Storage Networking Industry Association. All Rights Reserved. Case Study : Reduce Billing Cycle
Without CFS Drawbacks
OSS Customer Care ETL Billing • Time required to process customer billing included 12 hours of copy time • For billing systems, time is money • Redundant copies of data at each server means 2x the storage requirements
With CFS Benefits
OSS Customer Care ETL Billing • CFS eliminates copy time so one process can start when another completes • Single copy of data shared among servers CFS • 12 hour reduction in billing cycle
Scale and Availability Considerations for Cluster File Systems © 2011 Storage Networking Industry Association. All Rights Reserved. Case Study : Storage Consolidation
Traditional Application Architecture Shared Storage Architecture
SPARE SPARE SPARE SPARE SPARE
POST POST POST POST POST POST PROD PROD PROC PROC PROC PROC PROC PROC
Grow 1 2 3 4 25% Less 4 x 500GB ISLANDS = 2TB 1.5 TB POOL
Traditional File System Shared Cluster File System – Storage is accessible to all nodes – Islands of storage zoned to each server – Reduce upfront over-provisioning – All nodes share common free – Storage over-provisioned due to space unknown storage growth needs – Minimize idle server and storage – When storage is filled, new storage resources must be provisioned
Scale and Availability Considerations for Cluster File Systems © 2011 Storage Networking Industry Association. All Rights Reserved. Availability and Scale for Clustered NAS
High Availability of NFS and CIFS service Distributed NFS load balancing for performance Scale servers and storage independently More servers gives linear performance Flexible storage growth Stretch your NFS/CIFS cluster (up to 100 km
active/active) Lock Coordination Choose your platform (Solaris, Sol x86, AIX, RHEL5) Integrated with … Dynamic Storage Tiering Dynamic Multi Pathing Thin Provisioning Reclamation Increased Price / Performance compared to a similar NAS appliances
Scale and Availability Considerations for Cluster File Systems © 2011 Storage Networking Industry Association. All Rights Reserved. Q&A / Feedback
Please send any questions or comments on this presentation to SNIA: [email protected]
Many thanks to the following individuals for their contributions to this tutorial. - SNIA Education Committee
Karthik Ramamurthy David Noy
Scale and Availability Considerations for Cluster File Systems © 2011 Storage Networking Industry Association. All Rights Reserved. 26