Lustre 1.8 Operations Manual
Total Page:16
File Type:pdf, Size:1020Kb
Lustre™ 1.8 Operations Manual Part No. 821-0035-12 Lustre manual version: Lustre_1.8_man_v1.4 June 2011 Copyright© 2007-2010 Sun Microsystems, Inc., 4150 Network Circle, Santa Clara, California 95054, U.S.A. All rights reserved. U.S. Government Rights - Commercial software. Government users are subject to the Sun Microsystems, Inc. standard license agreement and applicable provisions of the FAR and its supplements. Sun, Sun Microsystems, the Sun logo and Lustre are trademarks or registered trademarks of Sun Microsystems, Inc. in the U.S. and other countries. UNIX is a registered trademark in the U.S. and other countries, exclusively licensed through X/Open Company, Ltd. Products covered by and information contained in this service manual are controlled by U.S. Export Control laws and may be subject to the export or import laws in other countries. Nuclear, missile, chemical biological weapons or nuclear maritime end uses or end users, whether direct or indirect, are strictly prohibited. Export or reexport to countries subject to U.S. embargo or to entities identified on U.S. export exclusion lists, including, but not limited to, the denied persons and specially designated nationals lists is strictly prohibited. DOCUMENTATION IS PROVIDED "AS IS" AND ALL EXPRESS OR IMPLIED CONDITIONS, REPRESENTATIONS AND WARRANTIES, INCLUDING ANY IMPLIED WARRANTY OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE OR NON-INFRINGEMENT, ARE DISCLAIMED, EXCEPT TO THE EXTENT THAT SUCH DISCLAIMERS ARE HELD TO BE LEGALLY INVALID. This work is licensed under a Creative Commons Attribution-Share Alike 3.0 United States License. To view a copy of this license and obtain more information about Creative Commons licensing, visit Creative Commons Attribution-Share Alike 3.0 United States or send a letter to Creative Commons, 171 2nd Street, Suite 300, San Francisco, California 94105, USA. Please Recycle Please Recycle Contents Preface xxv Part I Lustre Architecture 1. Introduction to Lustre 1–1 1.1 Introducing the Lustre File System 1–2 1.1.1 Lustre Key Features 1–3 1.2 Lustre Components 1–5 1.2.1 Lustre Networking (LNET) 1–7 1.2.2 Management Server (MGS) 1–7 1.3 Lustre Systems 1–8 1.4 Files in the Lustre File System 1–10 1.4.1 Lustre File System and Striping 1–12 1.4.2 Lustre Storage 1–13 1.4.2.1 OSS Storage 1–13 1.4.2.2 MDS Storage 1–13 1.4.3 Lustre System Capacity 1–14 1.5 Lustre Configurations 1–14 1.6 Lustre Networking 1–16 1.7 Lustre Failover and Rolling Upgrades 1–17 v 2. Understanding Lustre Networking 2–1 2.1 Introduction to LNET 2–1 2.2 Supported Network Types 2–2 2.3 Designing Your Lustre Network 2–3 2.3.1 Identify All Lustre Networks 2–3 2.3.2 Identify Nodes to Route Between Networks 2–3 2.3.3 Identify Network Interfaces to Include/Exclude from LNET 2–3 2.3.4 Determine Cluster-wide Module Configuration 2–4 2.3.5 Determine Appropriate Mount Parameters for Clients 2–4 2.4 Configuring LNET 2–5 2.4.1 Module Parameters 2–5 2.4.1.1 Using Usocklnd 2–7 2.4.1.2 OFED InfiniBand Options 2–8 2.4.2 Module Parameters - Routing 2–8 2.4.2.1 LNET Routers 2–11 2.4.3 Downed Routers 2–12 2.5 Starting and Stopping LNET 2–13 2.5.1 Starting LNET 2–13 2.5.1.1 Starting Clients 2–13 2.5.2 Stopping LNET 2–14 Part II Lustre Administration 3. Installing Lustre 3–1 3.1 Preparing to Install Lustre 3–2 3.1.1 Supported Operating System, Platform and Interconnect 3–3 3.1.2 Required Lustre Software 3–4 3.1.3 Required Tools and Utilities 3–4 3.1.4 (Optional) High-Availability Software 3–4 vi Lustre 1.8 Operations Manual • June 2011 3.1.5 Debugging Tools 3–5 3.1.6 Environmental Requirements 3–6 3.1.7 Memory Requirements 3–7 3.1.7.1 MDS Memory Requirements 3–7 3.1.7.2 OSS Memory Requirements 3–8 3.2 Installing Lustre from RPMs 3–10 3.3 Installing Lustre from Source Code 3–14 3.3.1 Patching the Kernel 3–15 3.3.1.1 Introducing the Quilt Utility 3–15 3.3.1.2 Get the Lustre Source and Unpatched Kernel 3–16 3.3.1.3 Patch the Kernel 3–17 3.3.2 Create and Install the Lustre Packages 3–18 3.3.3 Installing Lustre with a Third-Party Network Stack 3–20 4. Configuring Lustre 4–1 4.1 Configuring the Lustre File System 4–2 4.1.0.1 Simple Lustre Configuration Example 4–5 4.1.0.2 Module Setup 4–10 4.1.1 Scaling the Lustre File System 4–10 4.2 Additional Lustre Configuration 4–10 4.3 Basic Lustre Administration 4–11 4.3.1 Specifying the File System Name 4–12 4.3.2 Starting up Lustre 4–12 4.3.3 Mounting a Server 4–13 4.3.4 Unmounting a Server 4–14 4.3.5 Working with Inactive OSTs 4–14 4.3.6 Finding Nodes in the Lustre File System 4–15 4.3.7 Mounting a Server Without Lustre Service 4–16 4.3.8 Specifying Failout/Failover Mode for OSTs 4–16 Contents vii 4.3.9 Running Multiple Lustre File Systems 4–17 4.3.10 Setting and Retrieving Lustre Parameters 4–19 4.3.10.1 Setting Parameters with mkfs.lustre 4–19 4.3.10.2 Setting Parameters with tunefs.lustre 4–19 4.3.10.3 Setting Parameters with lctl 4–20 4.3.10.4 Reporting Current Parameter Values 4–21 4.3.11 Regenerating the Lustre Configuration Logs 4–22 4.3.12 Changing a Server NID 4–23 4.3.13 Removing and Restoring OSTs 4–24 4.3.13.1 Removing an OST from the File System 4–24 4.3.13.2 Restoring an OST in the File System 4–26 4.3.14 Aborting Recovery 4–26 4.3.15 Determining Which Machine is Serving an OST 4–27 4.4 More Complex Configurations 4–28 4.4.1 Failover 4–28 4.5 Operational Scenarios 4–29 4.5.1 Unmounting a Server (without Failover) 4–31 4.5.2 Unmounting a Server (with Failover) 4–31 4.5.3 Changing the Address of a Failover Node 4–31 5. Service Tags 5–1 5.1 Introduction to Service Tags 5–1 5.2 Using Service Tags 5–2 5.2.1 Installing Service Tags 5–2 5.2.2 Discovering and Registering Lustre Components 5–3 5.2.3 Information Registered with Sun 5–6 6. Configuring Lustre - Examples 6–1 6.1 Simple TCP Network 6–1 viii Lustre 1.8 Operations Manual • June 2011 6.1.1 Lustre with Combined MGS/MDT 6–1 6.1.1.1 Installation Summary 6–1 6.1.1.2 Configuration Generation and Application 6–2 6.1.2 Lustre with Separate MGS and MDT 6–3 6.1.2.1 Installation Summary 6–3 6.1.2.2 Configuration Generation and Application 6–3 6.1.2.3 Configuring Lustre with a CSV File 6–4 7. More Complicated Configurations 7–1 7.1 Multihomed Servers 7–1 7.1.1 Modprobe.conf 7–1 7.1.2 Start Servers 7–3 7.1.3 Start Clients 7–4 7.2 Elan to TCP Routing 7–5 7.2.1 Modprobe.conf 7–5 7.2.2 Start servers 7–5 7.2.3 Start clients 7–5 7.3 Load Balancing with InfiniBand 7–6 7.3.1 Setting Up modprobe.conf for Load Balancing 7–6 7.4 Multi-Rail Configurations with LNET 7–7 8. Failover 8–1 8.1 What is Failover? 8–1 8.1.1 Failover Capabilities 8–2 8.1.2 Types of Failover Configurations 8–2 8.2 Failover Functionality in Lustre 8–3 8.2.1 MDT Failover Configuration (Active/Passive) 8–4 8.2.2 OST Failover Configuration (Active/Active) 8–4 8.2.3 Lustre Failover and MMP 8–4 Contents ix 8.2.3.1 Working with MMP 8–5 8.3 Configuring and Using Heartbeat with Lustre Failover 8–6 8.3.1 Creating a Failover Environment 8–6 8.3.1.1 Power Management Software 8–6 8.3.1.2 Power Equipment 8–7 8.3.2 Setting up the Heartbeat Software 8–7 8.3.2.1 Installing Heartbeat 8–8 8.3.2.2 Configuring Heartbeat 8–8 8.3.2.3 (Optional) Migrating a Heartbeat Configuration (v1 to v2) 8–13 8.3.3 Working with Heartbeat 8–14 8.3.3.1 Starting Heartbeat 8–14 8.3.3.2 Switching Resources Between Nodes 8–14 9. Configuring Quotas 9–1 9.1 Working with Quotas 9–1 9.1.1 Enabling Disk Quotas 9–2 9.1.1.1 Administrative and Operational Quotas 9–3 9.1.2 Creating Quota Files and Quota Administration 9–4 9.1.3 Quota Allocation 9–7 9.1.4 Known Issues with Quotas 9–10 9.1.4.1 Granted Cache and Quota Limits 9–10 9.1.4.2 Quota Limits 9–11 9.1.4.3 Quota File Formats 9–12 9.1.5 Lustre Quota Statistics 9–13 9.1.5.1 Interpreting Quota Statistics 9–14 10.