9 Shades of Lustre 04/2019 [email protected] Lustre Features Review

9 Shades of Lustre 04/2019 Sbuisson@Whamcloud.Com Lustre Features Review

9 shades of Lustre 04/2019 [email protected] Lustre features review ►Metadata operations & optimizations • DNE • Lazy size on MDT ►File striping extensions • PFL • FLR use-case • DoM use case ►Lustre network aspects • Multi-Rail • Dynamic Peer Discovery • UDSP • LNet Health whamcloud.com Lustre metadata ls -l speed improvement performance scale out Simplifying network Choosing the correct configuration striping Increasing network Strengthening data bandwidth availability Small file performance improvement whamcloud.com Lustre metadata scale out ►Lustre initial design whamcloud.com Lustre metadata scale out ►DNE • aka Distributed Namespace Environment • phase 1 introduced with Lustre 2.4 (May 2013) whamcloud.com Lustre metadata scale out whamcloud.com Lustre metadata scale out ►DNE phase 1 benefits: • support up to 256 MDTs • additional MDTs on additional MDSes ►DNE phase 1 limitations: • remote dir assigned to a single MDT • only remote directory creation/unlink are allowed • no migration tool to move between MDTs (needs ‘mv’) • synchronous cross-MDT operations whamcloud.com Lustre metadata scale out ►DNE phase 2 • introduced with Lustre 2.8 (March 2016) • spread a single directory across multiple MDTs Þallow striped dir in addition to remote dir Þmuch more flexible than phase 1 Striped Directory Dir shard 0 Dir shard 1 Dir shard 2 Dir shard 3 fileA fileB fileC fileD whamcloud.com Lustre metadata scale out ►DNE phase 2 addresses phase 1 limitations: • rename and link ops supported • tool to migrate directories from one MDT to another • asynchronous cross-MDT operations Þmore user-friendly ►DNE phase 2 benefits: • scale size and performance of large directories • simple load balancing across MDTs ►How to use: lfs mkdir -c mdt_count /mount_point/new_directory lfs mkdir -i mdt_index /mount_point/new_directory whamcloud.com Improving DNE usability ►DNE phase 3 • introduction began with Lustre 2.12 (December 2018) • directory restriping from single-MDT to striped directories ►How to migrate the contents of a large directory from its current default location to MDT0001 and MDT0003: lfs migrate -m 1,3 /mount_point/largedir ►How to pick least full MDT: lfs mkdir -i -1 /mount_point/new_dir whamcloud.com Improving DNE usability ►DNE phase 3 continuation in 2.13+ • automatically create new remote directory on "best" MDT with mkdir() osimplifies use of multiple MDTs without striping all directories osimilar to OST usage • automatic directory restriping oavoid explicit striping at create ostart with one-stripe directory oadd extra stripes as directory grows Master +4 dir stripes +12 directory stripes whamcloud.com Lustre metadata ls -l speed improvement performance scale out Simplifying network Choosing the correct configuration striping Increasing network Strengthening data bandwidth availability Small file performance improvement whamcloud.com Addressing ‘ls –l’ “slowness” ►Retrieving file size is not that simple in initial approach: • MDT holds some file metadata: ctime, mtime, owner, etc. • BUT IOs to files managed directly between clients and OSTs • need to send requests to all OSTs having a file’s objects to compute its size Þ“slow” ‘ls –l’ whamcloud.com Addressing ‘ls –l’ “slowness” ►Lazy Size on MDS (LSOM) • introduction began with Lustre 2.12 (December 2018) • lazy means ‘not real-time’ olazy size saved as an extended attribute on MDT olazy size updated on file close/truncate • useFul for policy engines that can read this extended attribute oRobinHood oStratagem o‘lfs find’ in 2.13 able to use LSOM to avoid OST RPCs • at this stage, Lustre client is not directly LSOM-aware. whamcloud.com Addressing ‘ls –l’ “slowness” ►‘ls –l’ performance with experimental LSOM-aware client whamcloud.com Lustre metadata ls -l speed improvement performance scale out Simplifying network Choosing the correct configuration striping Increasing network Strengthening data bandwidth availability Small file performance improvement whamcloud.com Choosing the right striping ►Striping is a convenient way to parallelize bandwidth to OSTs ►… but involving more components adds overhead • large stripe count to increase performance? • or small stripe count to limit overhead? ►… and users do not really understand striping • vast majority of files uses default striping • no correlation between file size and striping whamcloud.com Choosing the right striping ►Progressive File Layout (PFL) for improved performance and usability • introduced with Lustre 2.10 (July 2017) • file layout is described by a series of components • each component has its own stripe count, size, OST pool, etc 0 Object 4 Component 0 32M 32M Component 1 14 15 1616 17 1G 1G Component 2 EOF whamcloud.com Choosing the right striping ►Progressive File Layout (PFL) benefits • simplify Lustre usage for novice users • reasonable performance for a variety of I/O patterns olow overhead for small files ohigh bandwidth for large files • stepping stone to more features ►How to use PFL lfs setstripe [--component-end|-E end1] [STRIPE_OPTIONS] [--component-end|-E end2] [STRIPE_OPTIONS] ... filename whamcloud.com Performance benefit from PFL, as seen by end-users ►Multiple clients accessing the same shared file • IOR, 32 clients, 512 threads Source: ORNL presentation at LUG 2016 http://cdn.opensfs.org/wp-content/uploads/2016/04/LUG2016D3_Evaluating-Progressive-File-Layouts_Mohr.pdf whamcloud.com OST space balance with PFL [0, 1MB) stripe_count=1 [1MB, 64MB) stripe_count=4 [64MB, 128GB) stripe_count=16 [128GB, EOF) stripe_count=48 Source: ORNL presentation at LUG 2016 http://cdn.opensfs.org/wp-content/uploads/2016/04/LUG2016D3_Evaluating-Progressive-File-Layouts_Mohr.pdf whamcloud.com OST space balance with PFL Source: ORNL presentation at LUG 2016 http://cdn.opensfs.org/wp- content/uploads/2016/04/ LUG2016D3_Evaluating- Progressive-File- Layouts_Mohr.pdf whamcloud.com Lustre metadata ls -l speed improvement performance scale out Simplifying network Choosing the correct configuration striping Increasing network Strengthening data bandwidth availability Small file performance improvement whamcloud.com Strengthening data availability ►Software, network, hardware all contribute to unavailability of Lustre data • Lustre at the top of a deep software/hardware stack, depends on all components working • needs availability better than individual hardware and software components • needs more robustness against data loss/corruption whamcloud.com Strengthening data availability ►File Level Redundancy (FLR) provides significant value and functionality • introduction began with Lustre 2.11 (April 2018) • based on PFL feature • file’s data no longer in a single location oreplicas can be created on multiple OSTs Replica 1 Object j (PREFERRED) Replica 2 Object k whamcloud.com Strengthening data availability ►Multiple benefits from FLR, eg: • higher availability for server/network failure ofinally better than HA failover • robustness against data loss/corruption omirror, or M+N erasure coding for stripes (Lustre 2.14) • increased read speed for widely shared files omirror input data across many OSTs ►How to use FLR lfs mirror create <--mirror-count|-N[mirror_count] [setstripe_options]> ... <filename|directory> whamcloud.com Lustre metadata ls -l speed improvement performance scale out Simplifying network Choosing the correct configuration striping Increasing network Strengthening data bandwidth availability Small file performance improvement whamcloud.com Improving small file performance ►Small File I/O Concerns • small file data on a single OST ono benefit from multiple OSTs in parallel • random I/O patterns omore latency sensitive oslows down concurrent streaming I/O • data is small ono read-ahead possible omore RPCs for the same amount of data whamcloud.com Improving small file performance ►Data-on-MDT (DoM) small file performance • introduced with Lustre 2.11 (April 2018) • based on PFL feature • stores small file data directly on the MDT • DoM files grow on OSTs after the MDT size limit is reached Without DoM With DoM open(O_RDWR|O_TRUNC), open(O_RDWR|O_TRUNC), stat(), truncate() stat(), truncate() Client MDS Client MDS layout, lock, truncate, enqueue, attributes, read write OSS OSS OSS lock, read, OSS OSS attributes OSS whamcloud.com Improving small file performance ►Data-on-MDT (DoM) benefits • separates large and small I/O data streams • file size is immediately available ►Please keep in mind • increases RPC pressure on MDS • need for more storage space on MDTs whamcloud.com Improving small file performance 8k Reads ►DoM performance 900 for sub-DoM size 800 700 600 500 DOM MB/s Stripe=-1 (8) 400 Stripe=1 300 200 100 0 1 2 4 8 16 Clients whamcloud.com Improving small file performance ►DoM is completely configurable odecide which files to store on MDT odecide which size to store on MDT ►How to use DoM lfs setstripe --component-end|-E end1 --layout|-L mdt [--component-end|-E end2 [STRIPE_OPTIONS] ...] <filename> whamcloud.com Lustre metadata ls -l speed improvement performance scale out Simplifying network Choosing the correct configuration striping Increasing network Strengthening data bandwidth availability Small file performance improvement whamcloud.com Increasing network bandwidth ►Need for high network bandwidth • big Lustre clients with lot of memory and NUMA architecture • big Lustre servers with lot of storage (many OSTs, very large OSTs with DCR,…) ►Possible solutions? • Adding faster interfaces Þimplies replacing much or all of the network • Adding more interfaces to the nodes Þrequires a redesign of

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    44 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us