Using Sas® 9 and Red Hat's Global File System2 (Gfs2)

Copy Link

USING SAS® 9 AND RED HAT’S GLOBAL FILE SYSTEM2 (GFS2) SAS Institute August 2013 With the growing number of SAS GRID usage, our Red Hat (RHEL) users are looking for a clustered file system for the RHEL grid nodes. This short paper will discuss what is available, starting with Red Hat’s cluster file system is called Global File System 2 (GFS2) that is part of the Red Hat Resilient Storage Add- On. Note that Red Hat requires an architecture review be conducted by Red Hat prior to the implementation of a RHEL Cluster. Please work with your Red Hat account team to make this happen. This clustered file systems works very nicely with SAS, however to get the performance enhancements that Red Hat has made to GFS2, you need to run with RHEL 6.4 + errata patches through mid-May 2013. This errata provide fixes to the tuned service as well as address a concern with irqbalance. You should not use GFS2 with any versions of RHEL prior to RHEL 6.4 and the mid-May errata due to various performance issues. Using GFS2 with a version of RHEL5 will result in severe functional problems, and these problems may exist with early versions of RHEL 6. In addition to using the above release of RHEL, there are some other tuning requirements that should be done when setting up the clustered file system for your SAS GRID environment. The penalty for not applying these tuning guidelines is very costly down time due to unacceptable performance. It is strongly recommended that SAS WORK directories be place on a separate GFS2 file system from the permanent SAS data file space to avoid fragmentation in the permanent SAS data file space. Use tuned tool and tune with profile 'enterprise-storage'. Transparent Huge Pages these days does not impact performance by more than a 1-2%. The changes for GFS2 grid are in addition to what is recommended for standalone systems so this should have already been done. To see the standalone RHEL system recommendations, please review this paper: http://support.sas.com/resources/papers/proceedings11/72480_RHEL6_Tuning_Tips.pdf GFS2 is limited to 16 systems in a cluster. Use the deadline I/O scheduler and sets the dirty page ratio to 40 by setting the option VM.DIRTY_RATIO= to 40. These settings greatly improved the performance of the workloads. Improve the behavior of the Distributed Lock Manager (DLM) by making the following changes: echo 16384 > /sys/kernel/config/dlm/cluster/dirtbl_size echo 16384 > /sys/kernel/config/dlm/cluster/rsbtbl_size echo 16384 > /sys/kernel/config/dlm/cluster/lkbtbl_size Instructions on how to apply these tuning parameters can be found in http://www.redhat.com/resourcelibrary/datasheets/rhel-sas-deployments Use LVCHANGE-R <the value should be appropriate for the workload> to set read-ahead for the file system. Hopefully this has already been done following the standard RHEL tuning guidelines. With RHEL 6.4 and the mid-May 2013 errata applied, SAS found that the performance of several different workloads performed very well on GFS2. If you are interested in seeing how what other clustered file systems are available on RHEL systems, please review this paper, http://support.sas.com/resources/papers/proceedings13/484-2013.pdf. And additional tuning guidelines can be found on this SAS Support Note: http://support.sas.com/kb/42/197.html Please note that Red Hat requires an architecture review be conducted by Red Hat prior to the implementation of a RHEL Cluster. Please work with your Red Hat account team to make this happen. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. Copyright © 2013 SAS Institute Inc., Cary, NC, USA. All rights reserved. .

Recommended publications

HP Storageworks Clustered File System Command Line Reference

HP StorageWorks Clustered File System 3.0 Command Line reference guide *392372-001* *392372–001* Part number: 392372–001 First edition: May 2005 Legal and notice information © Copyright 1999-2005 PolyServe, Inc. Portions © 2005 Hewlett-Packard Development Company, L.P. Neither PolyServe, Inc. nor Hewlett-Packard Company makes any warranty of any kind with regard to this material, including, but not limited to, the implied warranties of merchantability and fitness for a particular purpose. Neither PolyServe nor Hewlett-Packard shall be liable for errors contained herein or for incidental or consequential damages in connection with the furnishing, performance, or use of this material. This document contains proprietary information, which is protected by copyright. No part of this document may be photocopied, reproduced, or translated into another language without the prior written consent of Hewlett-Packard. The information is provided “as is” without warranty of any kind and is subject to change without notice. The only warranties for HP products and services are set forth in the express warranty statements accompanying such products and services. Nothing herein should be construed as constituting an additional warranty. Neither PolyServe nor HP shall be liable for technical or editorial errors or omissions contained herein. The software this document describes is PolyServe confidential and proprietary. PolyServe and the PolyServe logo are trademarks of PolyServe, Inc. PolyServe Matrix Server contains software covered by the following copyrights and subject to the licenses included in the file thirdpartylicense.pdf, which is included in the PolyServe Matrix Server distribution. Copyright © 1999-2004, The Apache Software Foundation. Copyright © 1992, 1993 Simmule Turner and Rich Salz.
Shared File Systems: Determining the Best Choice for Your Distributed SAS® Foundation Applications Margaret Crevar, SAS Institute Inc., Cary, NC

Paper SAS569-2017 Shared File Systems: Determining the Best Choice for your Distributed SAS® Foundation Applications Margaret Crevar, SAS Institute Inc., Cary, NC ABSTRACT If you are planning on deploying SAS® Grid Manager and SAS® Enterprise BI (or other distributed SAS® Foundation applications) with load balanced servers on multiple operating systems instances, , a shared file system is required. In order to determine the best shared file system choice for a given deployment, it is important to understand how the file system is used, the SAS® I/O workload characteristics performed on it, and the stressors that SAS Foundation applications produce on the file system. For the purposes of this paper, we use the term "shared file system" to mean both a clustered file system and shared file system, even though" shared" can denote a network file system and a distributed file system – not clustered. INTRODUCTION This paper examines the shared file systems that are most commonly used with SAS and reviews their strengths and weaknesses. SAS GRID COMPUTING REQUIREMENTS FOR SHARED FILE SYSTEMS Before we get into the reasons why a shared file system is needed for SAS® Grid Computing, let’s briefly discuss the SAS I/O characteristics. GENERAL SAS I/O CHARACTERISTICS SAS Foundation creates a high volume of predominately large-block, sequential access I/O, generally at block sizes of 64K, 128K, or 256K, and the interactions with data storage are significantly different from typical interactive applications and RDBMSs. Here are some major points to understand (more details about the bullets below can be found in this paper): SAS tends to perform large sequential Reads and Writes.
Understanding Lustre Filesystem Internals

ORNL/TM-2009/117 Understanding Lustre Filesystem Internals April 2009 Prepared by Feiyi Wang Sarp Oral Galen Shipman National Center for Computational Sciences Oleg Drokin Tom Wang Isaac Huang Sun Microsystems Inc. DOCUMENT AVAILABILITY Reports produced after January 1, 1996, are generally available free via the U.S. Department of Energy (DOE) Information Bridge. Web site http://www.osti.gov/bridge Reports produced before January 1, 1996, may be purchased by members of the public from the following source. National Technical Information Service 5285 Port Royal Road Springfield, VA 22161 Telephone 703-605-6000 (1-800-553-6847) TDD 703-487-4639 Fax 703-605-6900 E-mail [email protected] Web site http://www.ntis.gov/support/ordernowabout.htm Reports are available to DOE employees, DOE contractors, Energy Technology Data Exchange (ETDE) representatives, and International Nuclear Information System (INIS) representatives from the following source. Office of Scientific and Technical Information P.O. Box 62 Oak Ridge, TN 37831 Telephone 865-576-8401 Fax 865-576-5728 E-mail [email protected] Web site http://www.osti.gov/contact.html This report was prepared as an account of work sponsored by an agency of the United States Government. Neither the United States Government nor any agency thereof, nor any of their employees, makes any warranty, express or implied, or assumes any legal liability or responsibility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disclosed, or represents that its use would not infringe privately owned rights. Reference herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwise, does not necessarily constitute or imply its endorsement, recommendation, or favoring by the United States Government or any agency thereof.
High Availability for RHEL on System Z

High Availability for RHEL on System z David Boyes Sine Nomine Associates Agenda • Clustering • High Availability • Cluster Management • Failover • Fencing • Lock Management • GFS2 Clustering • Four types – Storage – High Availability – Load Balancing – High Performance High Availability • Eliminate Single Points of Failure • Failover • Simultaneous Read/Write • Node failures invisible outside the cluster • rgmanager is the core software High Availability • Major Components – Cluster infrastructure — Provides fundamental functions for nodes to work together as a cluster • Configuration-file management, membership management, lock management, and fencing – High availability Service Management — Provides failover of services from one cluster node to another in case a node becomes inoperative – Cluster administration tools — Configuration and management tools for setting up, configuring, and managing the High Availability Implementation High Availability • Other Components – Red Hat GFS2 (Global File System 2) — Provides a cluster file system for use with the High Availability Add-On. GFS2 allows multiple nodes to share storage at a block level as if the storage were connected locally to each cluster node – Cluster Logical Volume Manager (CLVM) — Provides volume management of cluster storage – Load Balancer — Routing software that provides IP-Load-balancing Cluster Infrastructure • Cluster management • Lock management • Fencing • Cluster configuration management Cluster Management • CMAN – Manages quorum and cluster membership – Distributed
Comparative Analysis of Distributed and Parallel File Systems' Internal Techniques

Comparative Analysis of Distributed and Parallel File Systems’ Internal Techniques Viacheslav Dubeyko Content 1 TERMINOLOGY AND ABBREVIATIONS ................................................................................ 4 2 INTRODUCTION......................................................................................................................... 5 3 COMPARATIVE ANALYSIS METHODOLOGY ....................................................................... 5 4 FILE SYSTEM FEATURES CLASSIFICATION ........................................................................ 5 4.1 Distributed File Systems ............................................................................................................................ 6 4.1.1 HDFS ..................................................................................................................................................... 6 4.1.2 GFS (Google File System) ....................................................................................................................... 7 4.1.3 InterMezzo ............................................................................................................................................ 9 4.1.4 CodA .................................................................................................................................................... 10 4.1.5 Ceph.................................................................................................................................................... 12 4.1.6 DDFS ..................................................................................................................................................
The Chubby Lock Service for Loosely-Coupled Distributed Systems

The Chubby lock service for loosely-coupled distributed systems Mike Burrows, Google Inc. Abstract example, the Google File System [7] uses a Chubby lock to appoint a GFS master server, and Bigtable [3] uses We describe our experiences with the Chubby lock ser- Chubby in several ways: to elect a master, to allow the vice, which is intended to provide coarse-grained lock- master to discover the servers it controls, and to permit ing as well as reliable (though low-volume) storage for clients to find the master. In addition, both GFS and a loosely-coupled distributed system. Chubby provides Bigtable use Chubby as a well-known and available loca- an interface much like a distributed file system with ad- tion to store a small amount of meta-data; in effect they visory locks, but the design emphasis is on availability use Chubby as the root of their distributed data struc- and reliability, as opposed to high performance. Many tures. Some services use locks to partition work (at a instances of the service have been used for over a year, coarse grain) between several servers. with several of them each handling a few tens of thou- Before Chubby was deployed, most distributed sys- sands of clients concurrently. The paper describes the tems at Google used ad hoc methods for primary elec- initial design and expected use, compares it with actual tion (when work could be duplicated without harm), or use, and explains how the design had to be modified to required operator intervention (when correctness was es- accommodate the differences.
Scaling HDFS with a Strongly Consistent Relational Model for Metadata Kamal Hakimzadeh, Hooman Peiro Sajjad, Jim Dowling

Scaling HDFS with a Strongly Consistent Relational Model for Metadata Kamal Hakimzadeh, Hooman Peiro Sajjad, Jim Dowling To cite this version: Kamal Hakimzadeh, Hooman Peiro Sajjad, Jim Dowling. Scaling HDFS with a Strongly Consistent Relational Model for Metadata. 4th International Conference on Distributed Applications and In- teroperable Systems (DAIS), Jun 2014, Berlin, Germany. pp.38-51, 10.1007/978-3-662-43352-2_4. hal-01287731 HAL Id: hal-01287731 https://hal.inria.fr/hal-01287731 Submitted on 14 Mar 2016 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Distributed under a Creative Commons Attribution| 4.0 International License Scaling HDFS with a Strongly Consistent Relational Model for Metadata Kamal Hakimzadeh, Hooman Peiro Sajjad, Jim Dowling KTH - Royal Institute of Technology Swedish Institute of Computer Science (SICS) {mahh, shps, jdowling}@kth.se Abstract. The Hadoop Distributed File System (HDFS) scales to store tens of petabytes of data despite the fact that the entire le system's metadata must t on the heap of a single Java virtual machine. The size of HDFS' metadata is limited to under 100 GB in production, as garbage collection events in bigger clusters result in heartbeats timing out to the metadata server (NameNode).
Designing High-Performance and Scalable Clustered Network Attached Storage with Infiniband

DESIGNING HIGH-PERFORMANCE AND SCALABLE CLUSTERED NETWORK ATTACHED STORAGE WITH INFINIBAND DISSERTATION Presented in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the Graduate School of The Ohio State University By Ranjit Noronha, MS * * * * * The Ohio State University 2008 Dissertation Committee: Approved by Dhabaleswar K. Panda, Adviser Ponnuswammy Sadayappan Adviser Feng Qin Graduate Program in Computer Science and Engineering c Copyright by Ranjit Noronha 2008 ABSTRACT The Internet age has exponentially increased the volume of digital media that is being shared and distributed. Broadband Internet has made technologies such as high quality streaming video on demand possible. Large scale supercomputers also consume and cre- ate huge quantities of data. This media and data must be stored, cataloged and retrieved with high-performance. Researching high-performance storage subsystems to meet the I/O demands of applications in modern scenarios is crucial. Advances in microprocessor technology have given rise to relatively cheap off-the-shelf hardware that may be put together as personal computers as well as servers. The servers may be connected together by networking technology to create farms or clusters of work- stations (COW). The evolution of COWs has significantly reduced the cost of ownership of high-performance clusters and has allowed users to build fairly large scale machines based on commodity server hardware. As COWs have evolved, networking technologies like InfiniBand and 10 Gigabit Eth- ernet have also evolved. These networking technologies not only give lower end-to-end latencies, but also allow for better messaging throughput between the nodes. This allows us to connect the clusters with high-performance interconnects at a relatively lower cost.
Clustering of Openvms Installations for High Availability

Clustering of OpenVMS installations for high availability Norman Kluge norman.kluge [at] student.hpi.uni-potsdam.de Hasso-Plattner-Institut, Potsdam Lecture: Dependable Systems Summer term 2010 Key words: OpenVMS, Cluster, Availabilty, Dependable Systems 1 Introduction Nowadays services have to be high available. Datacenters are often fault tolerant. That means they are for instance tolerant against hardware failures, software fail- ures or electricity failures. But what happened if a hole datacenter is destroyed. High availabilty means that the service still has to be online. OpenVMSCluster oers concepts of so called disaster tolerance, means even if a datacenter fails, the service never stops the beat. That is ensured by dierent approaches covered in this paper. In the rst section, some terminology of OpenVMSCluser are introduced. In the second one, some concepts, how OpenVMSCluser ensures high availabilty, are covered. In the las section, some practical experience is mentioned. 2 OpenVMSCluster Basics In this section some basic terminology of OpenVMSCluster is described. 2.1 Benets There are some benets resulting from clustering of OpenVMS installations. First point is, that the dierent Workstations can share resources (e.g. disks, network connections, ...) among them. The sharing concept is very important for another important benet: the promise of high availability. With this and nonstop processing an OpenVMSCluster guarantees that the services running on it are always available and responding the user. That correlates with the benets of scalability, performance and load balancing, which means that the load is spread into the cluster and all available workstations are working on the task. There are also some security features oered by OpenVMSCluster.
Linux Clustering & Storage Management

Linux Clustering & Storage Management Peter J. Braam CMU, Stelias Computing, Red Hat Disclaimer Several people are involved: Stephen Tweedie (Red Hat) Michael Callahan (Stelias) Larry McVoy (BitMover) Much of it is not new – Digital had it all and documented it! IBM/SGI ... have similar stuff (no docs) Content What is this cluster fuzz about? Linux cluster design Distributed lock manager Linux cluster file systems Lustre: the OBSD cluster file system Cluster Fuz Clusters - purpose Assume: Have a limited number of systems On a secure System Area Network Require: A scalable almost single system image Fail-over capability Load-balanced redundant services Smooth administration Precursors – ad hoc solutions WWW: Piranha, TurboCluster, Eddie, Understudy: 2 node group membership Fail-over http services Database: Oracle Parallel Server File service Coda, InterMezzo, IntelliMirror Ultimate Goal Do this with generic components OPEN SOURCE Inspiration: VMS VAX Clusters New: Scalable (100,000’s nodes) Modular The Linux “Cluster Cabal”: Peter J. Braam – CMU, Stelias Computing, Red Hat (?) Michael Callahan – Stelias Computing, PolyServe Larry McVoy – BitMover Stephen Tweedie – Red Hat Who is doing what? Tweedie McVoy Project leader Cluster computing Core cluster services SMP clusters Callahan Braam Varia DLM Red Hat InterMezzo FS Cluster apps & admin Lustre Cluster FS UMN GFS: Shared block FS Technology Overview Modularized VAX cluster architecture (Tweedie) Core Support Clients Transition Cluster db Distr. Computing Integrity Quorum Cluster Admin/Apps Link Layer Barrier Svc Cluster FS & LVM Channel Layer Event system DLM Components Channel layer - comms: eth, infiniband Link layer - state of the channels Integration layer - forms cluster topology CDB - persistent cluster internal state (e.g.
Inside the Lustre File System

Inside The Lustre File System Technology Paper An introduction to the inner workings of the world’s most scalable and popular open source HPC file system Torben Kling Petersen, PhD Inside The Lustre File System The Lustre High Performance Parallel File System Introduction Ever since the precursor to Lustre® (known as the Object- Based Filesystem, or ODBFS) was developed at Carnegie Mellon University in 1999, Lustre has been at the heart of high performance computing, providing the necessary throughput and scalability to many of the fastest supercomputers in the world. Lustre has experienced a number of changes and, despite the code being open source, the ownership has changed hands a number of times. From the original company started by Dr. Peter Braam (Cluster File Systems, or CFS), which was acquired by Sun Microsystems in 2008—which was in turn acquired by Oracle in 2010—to the acquisition of the Lustre assets by Xyratex in 2013, the open source community has supported the proliferation and acceptance of Lustre. In 2011, industry trade groups like OpenSFS1, together with its European sister organization, EOFS2, took a leading role in the continued development of Lustre, using member fees and donations to drive the evolution of specific projects, along with those sponsored by users3 such as Oak Ridge National Laboratory, Lawrence Livermore National Laboratory and the French Atomic Energy Commission (CEA), to mention a few. Today, in 2014, the Lustre community is stronger than ever, and seven of the top 10 high performance computing (HPC) systems on the international Top 5004 list (as well as 75+ of the top 100) are running the Lustre high performance parallel file system.
Newest Trends in High Performance File Systems

Newest Trends in High Performance File Systems Elena Bergmann Arbeitsbereich Wissenschaftliches Rechnen Fachbereich Informatik FakultätfürMathematik, Informatik und Naturwissenschaften UniversitätHamburg Betreuer Julian Kunkel 2015-11-23 Introduction File Systems Sirocco File System Summary Literature Agenda 1 Introduction 2 File Systems 3 Sirocco File System 4 Summary 5 Literature Elena Bergmann Newest Trends in High Performance File Systems 2015-11-23 2 / 44 Introduction File Systems Sirocco File System Summary Literature Introduction Current situation: Fundamental changes in hardware Core counts are increasing Performance improvement of storage devices is much slower Bigger system, more hardware, more failure probabilities System is in a state of failure at all times And exascale systems? Gap between produced data and storage performance (20 GB/s to 4 GB/s) I/O bandwidth requirement is high Metadata server often bottleneck Scalability not given Elena Bergmann Newest Trends in High Performance File Systems 2015-11-23 3 / 44 Introduction File Systems Sirocco File System Summary Literature Upcoming technologies until 2020 Deeper storage hierarchy (tapes, disc, NVRAM . ) Is traditional input/output technology enough? Will POSIX (Portable Operating System Interface) I/O scale? Non-volatile memory Storage technologies (NVRAM) Location across the hierarchy Node local storage Burst buffers New programming abstractions and workflows New generation of I/O ware and service Elena Bergmann Newest Trends in High Performance File Systems 2015-11-23