An Introduction to IBM Spectrum Scale

An Introduction to IBM Spectrum Scale

IBM Platform Computing February 2015 Thought Leadership White Paper An introduction to IBM Spectrum Scale A fast, simple, scalable and complete storage solution for today’s data-intensive enterprise 2 An introduction to IBM Spectrum Scale Contents Introduction Big data. Cloud storage. It doesn’t matter what you call it— 2 Introduction today’s exponential growth of data, transactions and digitally 3 Spectrum Scale overview aware devices is demanding larger and larger amounts of unstructured data storage. 3 The file system 4 Application interfaces IBM has taken on this challenge with a new software defined storage solution, IBM® Spectrum Scale™. IBM Spectrum Scale 4 Performance and scalability was formerly IBM General Parallel File System (IBM GPFS™), also formerly known as code name IBM Elastic Storage™. 5 Administration A high-performance enterprise platform for optimizing data 6 Data availability and file management, Spectrum Scale is used extensively across industries worldwide. Spectrum Scale simplifies data manage- 7 Data replication ment with integrated tools designed to help organizations 8 Spectrum Scale RAID manage petabytes of data and billions of files—as well as control the cost of managing these ever-growing data volumes. 8 Information lifecycle management (ILM) toolset 10 Big data support Long considered a pioneer in big data storage, IBM leads the industry in advanced storage technologies that enable companies 11 Cluster configurations to store large quantities of file data.1 The latest version of Spectrum Scale continues this tradition and marks a significant 11 Shared disk milestone in the evolution of big data management. Part of the 12 Network-based block I/O IBM Spectrum Storage™ family, Spectrum Scale 4.1 introduces revolutionary new features that clearly demonstrate the 14 Mixed clusters IBM commitment to providing groundbreaking storage solu- 14 Sharing data between clusters tions, including: 16 What’s new in Spectrum Scale Version 4.1 ●● File encryption and secure erase 18 Summary ●● Transparent flash cache ●● Network performance monitoring ●● Active File Management (AFM) parallel data transfers ●● Network File System (NFS) version 4 support and data migration ●● Backup and restore improvements 2 ●● File Placement Optimizer (FPO) enhancements IBM Platform Computing 3 This paper outlines the features available today in Spectrum A Spectrum Scale cluster can be a single node in a tiered storage Scale that organizations can use to manage file data. This func- solution, two nodes providing a high-availability platform tionality includes core Spectrum Scale concepts such as striped supporting a database application, or thousands of nodes used data storage, cluster configuration options such as direct storage for applications such as weather-pattern modeling. The largest access and network-based block I/O, storage automation tech- existing configurations—like Mira—can exceed hundreds of nologies such as information lifecycle management (ILM) tools, thousands of cores. and more. Spectrum Scale was designed from the beginning to support Spectrum Scale overview high-performance parallel workloads and has since proven effec- Since its introduction in 1998, Spectrum Scale (formerly GPFS) tive for a variety of applications. Today, it is installed in clusters has been field-proven on some of the world’s most powerful supporting high-performance computing applications from supercomputers,3 providing highly reliable and efficient use of climate modeling to tornado simulation, with databases such as infrastructure bandwidth. The Mira supercomputer at Argonne IBM DB2®, in big data MapReduce analytics, gene sequencing, National Laboratory—the fifth-fastest supercomputer in the digital media and scalable file serving. These applications are world4—features a 768,000-core IBM Blue Gene® cluster used across many industries, including financial, retail, digital with Spectrum Scale that supports scientific research, including media, biotechnology, science and government. complex modeling in the fields of material science, climatology, seismology and computational chemistry. Spectrum Scale is Spectrum Scale continues to push technology limits in demand- full-featured software defined storage with management tools ing large environments. You may not have multiple petabytes for advanced storage virtualization, integrated high availability, of data today, but chances are you will eventually. And when you automated tiered storage and the performance to effectively do, you can be assured that Spectrum Scale has already been manage very large quantities of file data. tested in these situations. This proven leadership is what makes Spectrum Scale a solid solution for any size application. Spectrum Scale allows a group of computers concurrent access to a common set of file data over a storage area network (SAN) The file system infrastructure, a TCP/IP network or a combination of connec- A Spectrum Scale file system is built from a collection of storage tion types. The computers can run a mix of IBM AIX®, Linux devices that contain the file system data and metadata. A file or Microsoft Windows operating systems. Spectrum Scale pro- system can be built from a single disk or contain thousands of vides storage management, information lifecycle management disks storing petabytes of data. Each file system can be accessible tools, centralized administration and shared access to file systems from all nodes within the cluster. There is no practical limit from remote Spectrum Storage clusters providing a global on the size of a file system. The architectural limit for a single namespace. file system is more than a yottabyte. Some Spectrum Scale customers use single file systems up to 18 PB in size, while others utilize file systems containing billions of files. 4 An introduction to IBM Spectrum Scale Application interfaces Performance and scalability Applications access files through standard POSIX file system Spectrum Scale provides unparalleled I/O performance for interfaces. Since all nodes see all file data, any node in the cluster unstructured data by: can concurrently read or update a common set of files—enabling applications to scale out easily. Spectrum Scale maintains the ●● Striping data across multiple disks attached to multiple nodes coherency and consistency of the file system using sophisticated ●● Employing high-performance metadata (inode) scans byte-range locking, token (distributed lock) management and ●● Supporting a wide range of file system block sizes to match journaling. This approach means applications using standard I/O requirements POSIX locking semantics do not need to be modified to run ●● Utilizing advanced algorithms to improve read-ahead and successfully on Spectrum Scale. write-behind I/O operations ●● Using block-level locking, based on a sophisticated scalable A MapReduce connector is included with Spectrum Scale, token management system, to provide data consistency while enabling applications to access file data from IBM Platform™ allowing multiple application nodes concurrent access to files Symphony or Hadoop MapReduce applications. The MapReduce connector can be used independently from the When creating a Spectrum Scale file system, raw storage devices cluster architecture, giving you the flexibility to optimize for are assigned to the file system as Network Shared Disks (NSD). your environment. Once an NSD is defined, all of the nodes in the Spectrum Scale cluster can access the disk, using a local disk connection or the OpenStack includes Cinder drivers for Spectrum Scale, NSD network protocol for shipping data over a TCP/IP or enhancing support for virtual machine (VM) storage. The InfiniBand connection. OpenStack Object Store project, known as Swift, offers cloud storage software that enables customers to store and retrieve Spectrum Scale token (distributed lock) management coordi- large volumes of object data with a simple application program- nates access to NSDs, helping ensure the consistency of file ming interface (API). To learn more about how Spectrum Scale system data and metadata when different nodes access the same integrates with Swift, documented in an IBM Redbook®, visit: file. Token management responsibility is dynamically allocated www.redbooks.ibm.com/abstracts/redp5113.html?Open among designated manager nodes in the cluster. Spectrum Scale can assign one or more nodes to act as token managers for a In addition to standard interfaces, Spectrum Scale offers a single file system, allowing greater scalability for large numbers unique set of extended interfaces that can be used to provide of files with high transaction workloads. In the event of a node advanced application functionality. Using these extended inter- failure, token management responsibility is transparently moved faces, an application can determine the storage pool placement to another node. of a file, create a file clone and manage quotas. IBM Platform Computing 5 All data stored in a Spectrum Scale file system is striped across Spectrum Scale administration tools simplify cluster-wide tasks. all storage devices within a storage pool—whether the pool con- A single command can perform a file system function across tains 2 or 2,000 storage devices. When storage devices are added the entire cluster and most can be issued from any node in the to a storage pool, existing file data can be redistributed across cluster. Optionally, you can designate a group of administration the new storage to improve performance. Data redistribution

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    20 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us