PARTNER TECHNICAL BRIEF

Storage Designed for Availability Dot Hill AssuredSAN™ & SANsymphony™-V

Dot Hill AssuredSAN Designing a data storage infrastructure for high availability requires the selection of 12-bay & 24-bay systems the most reliable hardware components, coupled with intelligent data management to assist recovery of critical business data regardless of the cause of failure.

Solution Highlights This paper outlines the design techniques employed to achieve high availability in the Dot Hill AssuredSAN series of storage arrays with an overview of the methods ●● Virtualized storage enables seamless failover (HA) and WAN of replication available within the DataCore SANsymphony-V platform. replication (DR) ●● Test function allows for simulated DataCore Ready Certified disaster recovery The combination of SANsymphony-V and Dot Hill AssuredSAN storage arrays ●● Eliminates single points of failure which are certified under the rigorous DataCore Ready Program offers a rock solid storage foundation built upon storage hardware designed for availability and a data Dot Hill AssuredSAN management layer providing both asynchronous and synchronous replication. Highlights Keeping Users And Applications Running Undisturbed ●● DataCore Ready certified ●● Demonstrated 99.999% Highly reliable storage hardware is a vital foundation for business continuity availability but many IT organizations fail to account for the multitude of other factors ●● Designed for high availability that greatly contribute to business upheaval. DataCore storage virtualization software prevents these more frequent sources of storage-related disruptions ●● NEBS & MIL-STD-810G compliant from ever affecting applications and we enable you to do this in a cost-effective manner that leverages a variety of different devices for redundancy. ●● Certified with VMware ESX, vSphere, Microsoft and Citrix Server The Datacore Ready Program Value Proposition DataCore Ready identifies solutions trusted to strengthen SANsymphony-V- based infrastructures. While DataCore solutions interoperate with common open and industry standard products, the DataCore Ready designation ensures that these solutions have successfully executed a functional test plan and additional verification testing to meet a superior level of joint solution compatibility.

Customers who leverage DataCore Ready offerings benefit from quality assurance, reduced risk and lower integration costs. The DataCore Ready logo helps customers quickly identify products and solutions that are optimized for SANsymphony-V. Designing for High Availability High availability for the Dot Hill storage arrays is achieved through a combination of three design elements:

• High reliability (measured by the Mean Time Between Failures or MTBF) of the complete system • Redundant subsystems to eliminate as many single points of failure as possible • Rapid repair of any failure (measured by Mean Time to Repair or MTTR) by using Field Replaceable Units Dot Hill’s mechanical design enables the power supply and fan, I/O module and controller, and disk (FRUs) for all critical subsystems. drives all to be serviced quickly as hot-swappable Field Replaceable Units (FRUs). Being able to replace redundant FRUs while the system is fully operational further enhances availability. The following equation for availability demonstrates the vital role of serviceability in the system’s design. the availability of the system itself. Dot The second serviceability technique is Maximum availability can be achieved Hill’s AssuredSAN architecture features immediate notification of any failure. only by minimizing the time it takes full redundancy for every subsystem The longer it takes to detect a failure, to affect a repair, which is reduced requiring a significant number of the longer it will take to repair; this is significantly by using FRUs. active components. The mechanical rather obvious. Time is of the essence chassis itself cannot be redundant, of for another reason, however: The failure Availability = MTBF course, and there is a single mid-plane of a redundant subsystem creates, in that performs the simple function of effect, a temporary single point of failure MTBF + MTTR connecting the redundant controllers that increases the risk of a system-level to the redundant disk drives. The mid- outage. For this reason, the firmware in To achieve maximum availability, Dot Hill plane has minimal active components, all Dot Hill systems is designed to detect, designs for reliability and serviceability, however, and Dot Hill selects these for isolate and confirm any failure, initiate as well as for manufacturability. the highest possible reliability. The result a fail over to a redundant subsystem, is an extraordinarily high MTBF for the and provide immediate notification. The Design for Reliability & chassis and its mid-plane, and therefore, actual “messaging” of the notification can Serviceability (DFRS) virtually no impact on system availability. also be configured to match operational Designing hardware for high reliability procedures to ensure that on-duty and serviceability involves both the Modular FRU Design with staff is properly and quickly notified. system and its subsystems. To achieve Rapid Fault Notification high availability at the system level, To enhance system serviceability Designing for Maximum MTBF Dot Hill integrates reliability into for the shortest possible MTTR, Dot At the FRU or subsystem level, Dot Hill the design process in several ways. Hill utilizes several complementary utilizes four separate design techniques The first and most obvious is the design techniques. The first is the use to maximize the MTBF of each, while use of disk drive redundancy with of a modular chassis with FRUs. The at the same time also maximizing the RAID configurations and dual power ability to swap out a confirmed failed inclusion of leading-edge SAN features. supplies, each including its own fan subsystem quickly and easily minimizes The first is reducing the part count. to prevent over-heating (and thereby, the time it takes to repair an installed Because any individual part can fail, the accelerated component failures). Even system and restore it to full operation. fewer there are, the higher the inherent higher availability is achieved by using By utilizing such a modular design, reliability of the subsystem. Dot Hill redundant controllers. By eliminating which provides convenient access to engineers endeavor, therefore, to single points of failure in these critical all subsystems, Dot Hill’s AssuredSAN minimize the parts required on all printed subsystems, the system itself continues products can be maintained seamlessly circuit boards and other subsystem FRUs. to operate normally during a failure of with minimal or no disruption in any single FRU. While such a failure does service during most repairs. factor into the subsystem’s MTBF (its rated reliability), it does not diminish

2 High Quality Components Ensuring Software Maturity DataCore Sync Mirroring for Lower TCO To maximize software reliability, Dot Having looked in detail at the techniques The second technique is to use only Hill monitors the improvement in the used to ensure storage hardware high quality parts. Higher quality parts Mean Time to Discovery (MTTD) of bugs availability we turn now to consider how cost more, of course, but their superior to assess the maturity of all software combination with DataCore’s storage performance and longer service lives and firmware during development. It is virtualization and management software normally contribute to a lower total cost important to note that MTTD is not an delivers a deployment architecture where of ownership in the long-run. Despite industry metric, but is instead a software single points of failure are eliminated. the higher per-part cost, minimizing the maturity metric created by Dot Hill as Synchronous mirroring handles the part count, while concurrently enhancing part of the company’s commitment real-time replication of I/Os providing functionality, helps to improve the to quality and reliability. All designs the ultimate in continuous availability. overall price/performance of a highly- must have a sufficiently high and reliable design. For these reasons, Dot stable MTTD before being finalized. Sync mirror features include: Hill utilizes only the highest quality parts available from reputable suppliers. Stringent Design Verification • N+1 redundant grids for continuous availability The design is released to manufacturing • Eliminate SAN or storage as a single Increased Operating Margins only after passing three comprehensive point of failure when combined The third technique involves the de- tests. The Engineering Verification with host MPIO or ALUA drivers rating of selected parts. Operating any Test (EVT) and the Design Verification Test (DVT) ensure that the system and/ • Enhance survivability using physically part or component at or near its rated separate nodes in different locations capacities inevitably shortens its useful or subsystem(s) fully satisfy all design • Mirrored virtual disks behave like service life. For critical parts, Dot Hill specifications, including those for high one, multi-ported shared drive, selects only those that will be able to reliability of both the hardware and software. These tests also confirm while automatically updating the operate at approximately 50% of their two copies simultaneously maximum allowable specifications for that marginal variations in parts from voltage, power and/or current. This can component suppliers will not compromise • Establish a common hot site for multiple data centers in the substantially increase the service life, and system reliability over the product’s same metropolitan area therefore, the MTBF of the subsystem. useful life of a minimum of 10 years. The Reliability Demonstration Test (RDT) is a • May be combined with separate and rigorous evaluation of the clustered file shares to achieve Insuring Software Maturity final production hardware that verifies high-availability NAS The fourth technique is unique to Dot its calculated reliability, availability and Having two nodes store the data Hill: designing for software reliability. In serviceability. Whereas some vendors simultaneously in conjunction with modern designs, software reliability is use only a few samples in a fairly short the host’s multipath I/O (MPIO) or just as important as hardware reliability, demonstration test, Dot Hill’s RDT Asymmetric Logical Unit Access (ALUA) and in some ways even more important. uses 18 to 20 fully-configured systems drivers eliminates single points of failure The reason is: Software bugs (including in a 13-week test that must produce or disruption. SANsymphony-V allows those in firmware) that cause downtime zero hardware failures to pass. normally take significantly longer to resolve than the more obvious hardware failures. Bugs are often dependent upon system state (the set of circumstances leading up to the failure), making them difficult to reproduce and isolate quickly, and any patch or update must be tested before it can be released. Both add considerably to the MTTR for software failures, thereby adversely impacting on system serviceability and availability.

3 you to configure redundant storage pools by synchronously mirroring between DataCore nodes. For any mirrored virtual disk, one DataCore node owns the primary copy and another holds the secondary copy. Those are maintained in lock step. In the diagram below, Node “A” owns the primary mirror labeled “P1” and Node “B” holds the secondary copy labeled “S1” for virtual disk VD1. The preferred path from the host to the virtual disk is generally assigned to the node that holds the primary copy of the mirrored set.

Under normal operation, all read and write requests issued to that virtual disk (UPS). The physical separation reduces date with changes at the local site, but will be serviced by the primary copy. the possibility that a single mishap makes no guarantees. It’s far better The secondary copy need only keep or facility problem will affect both than trying to constantly make backup up with new updates arriving from the members of the mirrored set. Round-trip tapes and ship them to a safe house or mirroring function. Generally, nodes are network latencies govern the maximum paying extra for point-products to handle configured to control primary copies distance between mirrored nodes. only this task. The advanced protocol for some virtual disks and secondary Current technologies support inter- handles prolonged transmission delays for others, thereby evenly balancing node distances up to 100 kilometers. or link outages allowing you to set the their read workloads. Alternatively, The actual limits depend on application priority of which virtual disks should N+1 configurations consisting of 3 or delay sensitivity and the network latency be allocated the most bandwidth. more nodes, may rely on a common experienced between locations. node to keep the mirrored backup copy. You can quickly get a remote site Should any errors be encountered on Asynchronous Remote initialized by cloning the primary site’s the preferred path, the host’s MPIO or Replication disks onto transportable media and ALUA drivers automatically fail over to DataCore’s remote replication function shipping them to the disaster recovery the alternate path without disrupting addresses requirements for secondary center. Then apply the changes that applications. The same is true if a node copies to be housed beyond the reach transpired while in transit. To help needs to be taken out-of-service for of synchronous mirroring, as in distant you build strong, verifiable disaster maintenance or upgrades. If the node disaster recovery sites, branch offices recovery practices that you can confide encounters any problems trying to and satellite facilities. It relies on a in, SANsymphony-V enables you to reach the disks where the primary copy basic IP connection between locations, test restoration at the remote site is stored, it will redirect the request to and works in both directions. That while production updates continue the node holding the mirrored copy. is, each site can act as the disaster to arrive. Any changes made during recovery spot for the other. the simulated recovery are then From a physical standpoint, best discarded and the standby copies practices call for the DataCore nodes The software operates asynchronously, refreshed with the newest updates. to be maintained in separate chassis at meaning that it does not hold up the different locations with their respective application waiting on confirmation About Dot Hill Systems portion of the disk pool so that each can from the remote end that the update For further information about Dot benefit from separate power, cooling has been stored in both places. Instead, Hill please visit www.dothill.com. and uninterruptible power supplies it offers to do its best to keep up to

0313

For additional information, please visit www.datacore.com or email [email protected]

© 2013 DataCore Software Corporation. All Rights Reserved. DataCore, the DataCore logo and SANsymphony are trademarks or registered trademarks of DataCore Software Corporation. All other products, services and company names mentioned herein may be trademarks of their respective owners.