<<

Differences Between AAFFEE LLuussttrree DDaattaa && SSttoorraaggee IInncc.. GGPPFFSS

AFE Inc. 123 Main Street, Suite 945 Somewhereville, NY 14880

Phone: 800­555­1234 Email: datainfo@afedatastorage Web: www.afedatastorage.com

June 2016 © AFE Data Storage Inc. IInnttrroodduuccttiioonn

The inadequacies of distributed file systems have brought about the need for a better solution to deal with the problems of high performance computing (HPC). Optimized for maximum bandwidth, parallel file systems have evolved to meet the needs of HPC clusters operating on high­speed networks.

This type of is engineered with parallelism in mind and can efficiently handle vast numbers of concurrent users. It is designed for HPC applications and systems, allowing the benefits of analytics to be realized. Of the different parallel file systems available today, two of the most popular are and GPFS.

LLuussttrree

Derived from the words “” and “cluster,” Lustre is a shared file system designed for HPC clusters. Peter Braam of Carnegie Mellon University initially started Lustre as a research project in 1999.

It is open source software, and has been used for some of the world’s most powerful . Several Linux distributions facilitate deployment of Lustre, making it an easy choice for organizations to incorporate into their systems.

Lustre can support tens of thousands of client nodes, and can handle close to a terabyte per second of I/O throughput. It is highly scalable, supports locking through the use of its distributed locking manager, seamlessly handles data object and file striping, and its high availability features ensure practically transparent and continuous operation.

The three major subsystems of Lustre’s architecture include:

• Metadata Server: MDS in a Lustre implementation is responsible for the storage and management of all data describing individual namespace components such as names and directories within the file system.

AFE Data Storage Inc. • Differences Between Lustre & GPFS ­ 2 ­ • Server: OSS provides request handling over the network, and also includes necessary file services. • Client: All nodes on Lustre using the services of its file system are known as clients and are assigned a namespace.

In previous versions, Lustre had stability issues, but has since improved in that area. Compared with other parallel file systems, it has a very large user base. Many vendors dealing with high performance computing promote Lustre with their products, thus making it one of the most widely used file systems for Linux clusters today.

GGPPFFSS

General Parallel File System, or GPFS as it is more commonly known, is another high­performance . Developed by IBM, it is based on Tiger Shark, a previous file system engineered by IBM in the early 1990s.

GPFS is used by many large organizations and is well­established in corporate enterprise environments. It is proprietary and usually bundled with IBM hardware and other solutions.

GPFS, like other parallel file systems, closely follows the POSIX (Portable Interface) standard, but allows parallel access to data. It achieves a high level of performance due to at level, and facilitates access to both data and metadata from any node.

GPFS attains an extremely high throughput by using a large block size in data striping operations, and is vastly superior among its competitors in reliability and .

The architecture of GPFS is comprised of the following components:

• Storage Node: This subsystem is responsible for the efficient coordination of data and metadata in striping operations across all storage targets.

AFE Data Storage Inc. • Differences Between Lustre & GPFS ­ 3 ­ • File System Node: This node is what monitors administrative tasks and ensures smooth operation of the file system activities. • Manager Node: There are several types of manager nodes that handle the different aspects of management within the file system. Some of the different management tasks a manager node would be assigned to could be locking or allocation. • Metanode: This component is assigned the job of managing the file system metadata from a centralized location. • Token Server: Coordinates and manages all tokens applied to all nodes in the cluster. Designated manager nodes may also perform this duty, distributing the load among the different manager nodes.

GPFS seems to be the parallel file system of choice in many organizations for many reasons. With IBM integrating it into many of its other products geared toward the corporate world, GPFS has become one of the dominant players in high performance computing clusters.

Although licensing considerations can be somewhat prohibitive, it has not stopped this reliable file system from capturing a large segment of the market.

Table A: Lustre & GPFS Features

Feature Lustre GPFS Open Source Yes No Parallel, Tuned for HPC Clusters Yes Yes POSIX File System Permissions Yes Yes Scalability > 10,000 Clients Yes No Solid Reliability & Stability Some Issues Yes High Level of Throughput Yes Yes Straightforward Deployment No Yes

AFE Data Storage Inc. • Differences Between Lustre & GPFS ­ 4 ­ MMaajjoorr DDiiffffeerreenncceess BBeettwweeeenn LLuussttrree aanndd GGPPFFSS

Both Lustre and GPFS are the two popular leaders when it comes to parallel file systems. However, each has strengths and weaknesses. Therefore, the question to be explored is not one of finding the better file system, but rather which file system is better suited for a particular organization.

Scalability

Lustre and GPFS are both highly scalable. Lustre can be effectively scaled up to tens of thousands of clients, while GPFS can support only thousands of clients.

On the surface, this can seem to be a problem for GPFS, but the market share IBM has already captured shows that most organizations do not plan on scaling to those numbers. Nevertheless, the scalability advantage is an area Lustre has had and still retains a good head start in.

High Availability

Lustre’s transparent failover and recovery operations are quite robust and well­ developed. Software upgrades are facilitated by failing over to a standby server, performing the upgrade, and rebooting without disrupting working jobs.

GPFS, on the other hand, simplifies recovery by keeping a log on every node. In other words, each node has its own log for each file system and can perform recovery on behalf of the failed node. For this reason, it is not necessary to wait for the failed node to come back up, allowing the recovery mechanism to expedite recovery procedures.

Metadata

Since the beginning, the architecture of Lustre seemed to have a very noticeable disadvantage when compared to GPFS. Lustre has a metadata server that contains namespace information on filenames and directories. This had always been seen as a single point of failure.

AFE Data Storage Inc. • Differences Between Lustre & GPFS ­ 5 ­ In comparison, GPFS stores its data in a distributed fashion, allowing it to be more robust than its open source counterpart. Since Lustre 2.4, multiple metadata targets in a single file system are now possible. Lustre still has a way to go, but in recent years has been making strides to close the gap with GPFS.

Data and File Striping

Both Lustre and GPFS are capable of data striping across multiple disks to efficiently handle very large files. GPFS initiates striping in the file system rather than going through an independent logical volume manager. Since it is directly managing striping operations, GPFS displays a high degree of fault tolerance as a result.

Cost and Technical Support Factors

If we were to take a look at factors affecting cost, again we would find advantages with Lustre in some areas, and advantages with GPFS in others. Services and technical support for GPFS is well­established and unified with IBM providing support.

Technical support for Lustre, on the other hand, is offered by an assortment of corporations like , , and Hewlett­Packard. Partner firms and independent companies frequently support both Lustre and GPFS. AFE Data Storage is one such firm, offering quality services and support for both file systems.

Another area affecting cost is licensing. Since Lustre is open source, it comes out as the winner. Many have found IBM’s licensing fees prohibitive, and tend to avoid GPFS because of this very reason. Others point out the great service and support that comes along with proprietary software, especially when a company like IBM is the vendor.

Lastly, the cost of expertise required is certainly a big drawback. A GPFS deployment requires much less technical expertise when compared to a Lustre deployment.

Personnel able to deal with this kind of deployment may or may not be present within the organization. If not, the time, effort, and cost of implementing Lustre as the parallel file system of choice will certainly increase.

AFE Data Storage Inc. • Differences Between Lustre & GPFS ­ 6 ­ CCoonncclluussiioonn

Lustre and GPFS both come with a multitude of advantages. Each have strengths in certain areas of operation, and choosing one over the other requires a thorough investigation of what is most important to the organization or corporate enterprise to be serviced.

Massive scalability falls to Lustre, while reliability goes to GPFS. However, both of these parallel file systems make a very good choice, whatever the system­wide needs may be.

Vendors of these file systems have also recently been trying to show the versatility of Lustre and GPFS by suggesting that either one can replace the Hadoop Distributed File System (HDFS). IBM leads the way in this, pointing out that GPFS can efficiently replace HDFS for implementations of Hadoop.

This would open up newer, bigger markets in the big data arena for GPFS, Lustre, or even other parallel file systems. In any case, the dominance and success of Lustre and GPFS are built on a solid foundation and are here to stay for the foreseeable future.

AFE Data Storage Inc. • Differences Between Lustre & GPFS ­ 7 ­