Around the distributed storage

Andrei Maslennikov CASPUR Consortium

DESY- June 2003 Contents

• Today’s subject

• Distributed filestore: requirements

• Names in the field

• Where do we stand with some of them: - CASPUR StorageLab June 2003 Report series 2,3

• Some observations

• New hardware and what it may be used for: - CASPUR StorageLab June 2003 Report series 1

A.Maslennikov - June 2003 2 Today’s subject

- Technologies that allow for the transparent file access across the UNIX-based computing fabric

- We will consider only the prinicpal solutions which are practiceable now or may, in our opinion, reach the production quality in the next few years

A.Maslennikov - June 2003 3 What would we expect from a distributed filestore? 1. Transparent File Access (TFA, POSIX1/8). 2. Acceptable Consistency Semantics. UNIX semantics is preferred. At least session semantics is required (writes to a file by a user are not visible to other users; once the file is closed, the changes are visible only to new sessions). File locking is desired but its absence is not considered as a limiting factor. 3. Large file support (> 2**31 bytes). 4. Security (ACLs). 5. Multiplatform support. 6. Performance close to the hardware capabilities. Scalability in performance and capacity. Ability to add/remove additional resources (CPUs, disks) without service interruption. Load balancing and high reactivity. 7. Extra features. Replication of critical read-only data with automatic redirection to the good copy. Single access point. Disconnected operation. Easy user and group level resource control. 8. Manageability. 9. Product maturity. We should consider only products with a solid history and/or support base. Potential new candidates should be judged mainly in function of the developer's capability to provide a reliable post-release maintenance of the product. A.Maslennikov - June 2003 4 Names in the field

Network-attached (NAS): NFS, AFS, IBM GPFS, , Intermezzo, WebFS - use IP data transport (GigE, QSW, Myrinet)

SAN-based systems: Sistina GFS, IBM StorTank, SGI CXFS, Veritas - shared block devices (Fibre Channel)

New solutions: DAFS - uses memory-to-memory transport (Virtual Interface, Infiniband)

A.Maslennikov - June 2003 5 SAN-based systems:

- GFS for is shipping since several years and may be thoroughly tested. size is limited to 2 TB. With kernel 2.6 series will go up to 16+ TB.

- IBM StorTank : vapourware since 1 year, but is about to be released this fall. We believe it should work, as it is a follow-up of their previous product, Sanergy, which is known as a working solution. Ports for Linux, Solaris, HP/UX and W2K. File system size - thousands of terabytes.

- SGI CXFS. Based on a very solid product for IRIX, employs the field-proven XFS and XVM technologies. Server on IRIX only. Shipping for IRIX, W2K and Solaris. AIX port is probably available in beta. Linux port is coming soon. Declared maximal file system size - 18 PB.

- Veritas. Concurrent write access on Linux is being implemented. Will probably ship this year. Well-known, robust brand.

A.Maslennikov - June 2003 6 NAS systems:

- Lustre, WebFS, Intermezzo - these in our opinion do not (yet) stand up to the requirement 9 - Product Maturity.

- AFS : Is in excellent state. Hundreds of terabytes installed, several million users worldwide. Field-proven, 5 major releases in 10 years. OpenAFS flourishing and expanding, especially on Linux. Satisfies most of the 9 requirements, but is slow for large files, and is yet lacking the very large file support.

- NFS. Is in a very good state, and became very performant on Linux. May be made scalable (Spinnakernetworks, Hybrid SAN / NAS Systems). Supports very large file sizes. NFS v. 4 will include ACLs (krb5), server will become stateful.

- GPFS. Linux port of a famous IBM product for RS/6000 architecture. Several releases so far, visible improvements. Worth looking at.

A.Maslennikov - June 2003 7 Now let us look at some real numbers….

Proceed to StorageLab June 2003 Report (series 2,3)

A.Maslennikov - June 2003 8 Some observations

1. SAN-based systems are visibly more performant than the NAS ones. We believe that the future belongs to the SAN-based architecture.

SAN-based systems solve the performance, file system size and capacity issues and scale very well.

The hardware infrastructure needed to set up a native SAN-based distributed filestore is still prohibitive: FC HBA : > 1 KEuro FC port : not less than 0.7 KEuro

[ Intermediate hw solutions like iSCSI (see our April-June 2002 and Pasta III WG E reports) are less performant than those based on the native FC, but are also quite expensive ]

A.Maslennikov - June 2003 9 Some observations

2. Given the high cost of the SAN infrastructure it is likely that NAS systems will continue to dominate the field during the next 3-5 years.

Of those, AFS remains a prime choice candidate or a file system that will be used to host smaller files (users’ home directories, executable binaries, software etc).

AFS, or any successor product with similar features, will certainly survive during the SAN era, as the need for an inexpensive FS that provides for transparent file access on a global scale (especially for Linux) is very pronounced.

For larger files, NFS remains the best bet whenever the cost is a factor. Hybrid SAN / NFS services may be used to address the scalability and single point of access issues.

A.Maslennikov - June 2003 10