Early Experiences with Storage Area Networks and CXFS John Lynch

Early Experiences with Storage Area Networks and CXFS John Lynch Aerojet 6304 Spine Road Boulder CO 80516 Abstract This paper looks at the design, integration and application issues involved in deploying an early access, very large, and highly available storage area network. Covered are topics from filesystem failover, issues regarding numbers of nodes in a cluster, and using leading edge solutions to solve complex issues in a real-time data processing network. 1 Introduction SAN technology can be categorized in two distinct approaches. Both Aerojet designed and installed a highly approaches use the storage area network available, large scale Storage Area to provide access to multiple storage Network over spring of 2000. This devices at the same time by one or system due to it size and diversity is multiple hosts. The difference is how known to be one of a kind and is the storage devices are accessed. currently not offered by SGI, but would serve as a prototype system. The most common approach allows the hosts to access the storage devices across The project’s goal was to evaluate Fibre the storage area network but filesystems Channel and SAN technology for its are not shared. This allows either a benefits and applicability in a second- single host to stripe data across a greater generation, real-time data processing number of storage controllers, or to network. SAN technology seemed to be share storage controllers among several the technology of the future to replace systems. This essentially breaks up a the traditional SCSI solution. large storage system into smaller distinct pieces, but allows for the cost-sharing of The approach was to conduct and the most expensive component, the evaluation of SAN technology as a storage controller. viable replacement for SCSI, consolidate redundant data in the system and The second approach allows for all overcome limitations of traditional file storage devices to be accessed by all sharing mechanisms. Paramount in the hosts on the storage area network and design criteria was to carefully balance also filesystem sharing across multiple the risk associated with deploying a hosts. SGI designed a shared filesytem, large-scale SAN prototype system in a Clustered XFS, (CXFS), to allow all of mission critical production environment. the systems in a cluster to have access to the same filesystem. However, they also have access to it a full fibre channel The SAN system needed to work with performance as if it was direct-atttached. existing tape storage devices. The program has compatibility requirements To date, the industry trends tend to be on with other programs that have legacy the share-hardware only approach. SGI Exabyte 8mm SCSI tape drives. is clearly the industry leader in the sharing of data with cluster filesystems, in addition to the sharing of hardware. 2.2 Design Methodology The system design uses elements from The system design consists of four both approaches but really takes different Storage Area Networks. Due advantage of the CXFS approach offered to the unique mission of each SAN, and by SGI. the current port limitations on switches, design decisions were made on the 2 Hardware Design makeup of the SAN architecture to limit the risk of failure in the implementation phase. 2.1 Requirements All of the SAN fabrics are based on the During the design phase of the system same building block principles that are some general requirements were levied outlined here. on the SAN technology. In the original system, the Fast-Wide In general, the access speed from server SCSI solution was replaced with the to disk had to meet the performance of direct-attach fibre channel RAID. direct-attached SCSI. This was not viewed as difficult since 1Gbit Fibre RAID SGI Channel is over twice times the transfer ULTRA SCSI Server rate as Ultra SCSI (40 MB/s). 40 MB/s No failure in any one server or workstation can affect the operation of RAID SGI Fibre Ch. another. Server 100 MB/s A single failure in the SAN fabric cannot create a system outage. Figure 1: SCSI Replacement with Fibre Channel The clustered filesystem private network had to work with the existing network A single Brocade Silkworm 2800 fibre infrastructure. channel switch was built into the system, connecting the server and the storage. The SAN system had to be economically justifiable to warrant the additional cost of employing the system. 2 Brocade The technique of gradual insertion of SGI Server Switch RAID technology enabled the smooth integration of fibre channel technology into an existing architecture. All four fabrics are based on are based on these Figure 2: Addition of Brocade Silkworm building blocks. Switch After the functionality was verified, the 2.3 Final SAN Fabric Design addition of a SCSI Exabyte 8mm tape drives to the system through a Fabric One is used by the servers to hold Crossroads fibre channel to SCSI router. live satellite downlink data, common data that is used by all servers such as satellite ephemeris, common maps and SGI RAID imagery, a single copy of custom ground Server Brocade station software. Switch SGI Server FC to SCSI Router This fabric has two 32 processor, five 16 processor, and one 4 processor Origin Archive 2000 servers accessing data on eight different RAID devices each containing Figure 3: Tape Drive Added two storage processors. Total storage capacity is 2.3TB. A fibre channel to SCSI router is a device that converts fibre channel protocol to SCSI protocol and vice R0 R1 R2 R3 R4 R5 R6 R7 versa. It is used to integrate legacy SCCI devices into a new generation fibre Fabric B channel architecture. Fabric A Host bus adapters (HBAs) were then Srv1 Srv2 Srv3 Srv4 Srv5 Srv6 Srv7 Srv8 added to the system to eliminate single point of failures in accessing the storage Figure 5: Fabric One Layout devices.. Fabric Two is used by the Sybase FC Archive servers. The operational databases that SGI Switch Server are used to store satellite trending data as FC SCSI Rtr. to well as ground station configuration RAID databases are stored here. SGI Server FC Switch This fabric has two eight processor Archive Full Mesh Topology Origin 2000 servers accessing data on Ensures Redundancy Rtr. SCSI FC to four different RAID devices each Figure 4: Full Redundant Architecture 3 containing two storage processors. Total storage capacity is 512MB. Fabric Four is used by intelligence operators. It contains all the large-scale imagery data used to support a real-time R0 R1 R2 R3 missile-warning mission. This fabric has two 4 processor Onyx2 Fabric B workstations accessing data on a single RAID device containing two storage Fabric A processors. Total storage capacity is 1TB. Srv1 Srv2 R0 Figure 6: Fabric Two Layout Fabric B Fabric Three consists of all the Fabric A workstations and storage that contains common data. The common data used by all workstations contains satellite Srv1 Srv2 ephemeris, common maps, our custom ground station software, and IRIX Figure 8: Fabric Four Layout auditing data. This SAN architecture was designed to This fabric has thirty-seven 4 processor enhance the existing system by reducing Onyx2 workstations accessing data on the number of software images that three RAID devices each containing two needed to be installed, consolidating the storage processors. Total storage system audit records into a single capacity is 384MB. location. This also reduces the dependency on NFS as a mechanism to R0 R1 R2 share files among the servers. 3 Reliability Fabric B To ensure maximum reliability, each Fabric A Origin 2000 server is equipped with dual Adaptec Fibre Channel Host Bus Adapters (HBAs). This will ensure that Workstations Workstations each server has a redundant path to the 1-22 23-37 storage arrays. Each server is connected to a primary fibre channel fabric (fabric A) and to a completely separate Figure 7: Fabric Three Layout secondary fibre channel fabric (Fabric B). Each fabric consists of Brocade 4 2800 fibre channel switches with application. All LUNs within the redundant power supplies. storage system are configured as RAID level 5. Media Interface Adapters (MIAs) that were installed on a copper RAID storage Normal access rate to a single RAID is processor to provide an optical 80MB/sec for each storage processor, connection to the server were removed. with two storage processors capable of The fibre channel switches were delivering 160MB/sec. In the event of a outfitted with both copper and optical failure in one of the storage processors, interfaces. The optical interfaces were the system is capable of maintaining used in connecting all the Origin 2000’s access to the storage, but at a degraded and Onyx2’s to the switches. The state of all of the LUNs being supported copper interfaces connect the switches to by the surviving storage processor. the RAID storage processors. Removing the MIAs increased the reliability of the Initial implementation of the SGI’s connection in eliminating a high failure Cluster File System, also known as rate item. CXFS, was using IRIX 6.5.6f. It was in its infancy and cluster stability was an The two fabrics are completely isolated issue. The cluster database would from each other. The RAID devices are become corrupt, get replicated across the SGI Fibre Channel RAID. Each contains other members in the cluster and the two Thor storage processors, one entire cluster would become unusable. connecting to the primary fabric and the Since upgrading to IRIX 6.5.10f with other connecting to the secondary fabric. patch 4156, significant improvements A typical fabric consists of four Brocade have been made to cluster stability. 2800 switches. These switches are connected to each other using three 4 Integration with SGI Inter-switch Links (ISL).

Early Experiences with Storage Area Networks and CXFS John Lynch

CXFSTM Client-Only Guide for SGI® Infinitestorage

Silicon Graphics, Inc. Scalable Filesystems XFS & CXFS

CXFSTM Administration Guide for SGI® Infinitestorage

Comparative Analysis of Distributed and Parallel File Systems' Internal Techniques

CXFS Customer Presentation

SGI's Complex Data Management Architectures

Designing High-Performance and Scalable Clustered Network Attached Storage with Infiniband

CXFS • CXFS XFS • File System Features: XFS XVM • Volume Management: XVM FC Driver

Analyzing Metadata Performance in Distributed File Systems

IRIX® 6.5.15 Update Guide

Guide to OS X SAN Software

The Evolution of File Systems