Orangefs Overview Tutorial.Pptx

OrangeFS Overview Tutorial Walt Ligon Clemson University Tutorial Goals • Brief History • Architecture Overview • User Interfaces • Important Features • Installaon/Configuraon Brief History • Brief Historyß Star&ng here! • Architecture Overview • User Interfaces • Important Features • Installaon/Configuraon A Brief History • First there was PVFS – V0 • Aric Blumer (Clemson) • 1993 • PVM User’s MeeBng Oak Ridge – V1 • Rob Ross (Clemson) • 1994 – 2000 – V2 • Rob Ross (Argonne), Phil Carns (Clemson), et al. • 2000 - Present PVFS Funding • NASA • NSF – PACI – HECURA • DOE – Office of Science – SciDAC • Government Agencies PVFS Partnerships • Clemson U • Ohio Supercomputer • Argonne NL Center • Northwestern U • Carnegie Mellon U • Acxiom • U of Heidleberg • Ames NL • Sandia NL • U of Michigan • U of Oregon • Ohio St. U Emergence of Orange • Project started in 2007 – Develop PVFS for “non-tradiBonal” uses • Very large number of small files • Smaller accesses • Much more metadata – Robust security features – Improved resilience • Began to see opportuniBes for broader area – Big Data Management – Clouds – Enterprise Today and Beyond • Clemson reclaims primary site – As of 2.8.4 began using name “OrangeFS” – Omnibond offers commercial support – Currently on 2.8.6 – Version 2.9.0 soon to be released with new features – 2.10 is internal development version for … • OrangeFS 3.0 What to Expect in 3.0 • Totally Redesigned Object Model – Replicaon – Migraon – Dynamic configuraon – Hierarchical Storage Management (Bers) – Metadata support for external devices (archive) – Rule-based security model – Rule-based policies Architecture Overview • Brief History • Architecture Overview ß You are here! • User Interfaces • Important Features • Installaon/Configuraon Architecture Overview • OrangeFS is designed for parallel machines – Client/Server architecture – Expects mulBple clients and servers • Two major sofware components – Server daemon – Client library • Two auxiliary components – Client core daemon – Kernel module User Level File System • Everything runs at the user level on clients and servers – ExcepBon is the kernel module, which is really just a shim – more details later • Server daemon runs on each server node and serves requests sent by the client library, which is linked to client code on compute nodes Networking • Client and server communicate using an interface called “BMI” – BMI supports tcp/ip, ib, mx, portals – Module design allows addiBon of new protocols – Requests use PVFS protocol similar to that of NFS but expanded to support a number of high performance features Server-to-Server Communicaon App App App App App App Mid Mid Mid Mid Mid Mid Client Client Client Client Client Client Network Network Serv Serv Serv Serv Serv Serv TradiBonal Metadata Scalable Metadata Operaon Operaon Create request causes client to Create request communicates with a single communicate with all servers O(p) server which in turn communicates with other servers using a tree-based protocol O (log p) Storage • The OrangeFS server interacts with storage on the host using an interface layer named “trove” – All storage objects referenced with a “handle” – Storage objects consist of two components • Bytestreams – sequences of data • Key/Value Pairs – data items that are accessed by key – As currently implemented … • Bytestreams with the local file system (data) • Key/value pairs with BerkeleyDB (metadata) File Model • Every file consists of two or more objects – Metadata object contains • TradiBonal file metadata (owner, permissions, etc.) • List of data object handles • FS specific items (layout, distribuBon, parameters) • User-defined aributes – Data objects contain • Contents of the file • Aiributes specific to that object (usually not visible) Directories • Work just like files except – Data objects contain directory entries rather than data – Directory entries have entry name and handle of that entry’s metadata object (like inode number) – Extendable hash funcBon selects which data object contains each entry – Mechanisms based on GIGA+ manage consistency as directory grows or shrinks Distributed Directories DirEnt1 Extensible Hashing Server0 DirEnt5 DirEnt1 DirEnt2 DirEnt3 Server1 DirEnt3 DirEnt4 DirEnt2 Server2 DirEnt5 DirEnt6 DirEnt6 DirEnt4 Server3 Policies • Objects of any type can reside on any server. – Random selecBon (default) used to balance – User can control various things • Which servers hold data • Which servers hold metadata • How many servers a given file or directory are spread across • How those servers are selected • How file data is distributed to the servers – Parameters can be set in configuraon file, many on a directory, or for a specific file User Interfaces • Brief History • Architecture Overview • User Interfacesß Half way! • Important Features • Installaon/Configuraon User Interfaces • System Interface • VFS Interface • Direct Access Library • MPI-IO • Windows Client • Web Services Module • UliBes The System Interface • Low-level interface of the client library – PVFS_sys_lookup() – PVFS_sys_getar() – PVFS_sys_setar() – PVFS_sys_io() • Based on the PVFS request protocol • Designed to have interfaces built on them • Provide access to all of the features • NOT a POSIX-like interface The VFS Interface • Linux kernel module allows OrangeFS volumes to be mounted and used like any other – Limited mmap support – No client side cache – A few semanBc issues • Must run client_core daemon – Reads requests from the kernel then calls library • The most common and convenient interface • BSD and OS X support via FUSE The Direct Library • InterposiBon library for file related stdio and Linux system calls • Links programs directly to the client library • Can preload the shared library and run programs without relinking • Faster, lower latency, more efficient than VFS • Configurable user-level client cache • Best opBon for serious applicaons Direct Access Library App • Implements: Client Core – POSIX system calls Kernel – Stdio library calls PVFS lib • Parallel extensions – NonconBguous I/O App – Non-blocking I/O • MPI-IO library Direct lib Kernel PVFS lib Direct Interface Client Caching Client Applicaon Direct Interface Client Cache File System File System File System File System File System File System • Direct Interface enables MulB-Process Coherent Client Caching for a single client MPI-IO • Most MPI libraries provide ROMIO support, which has support for direct access to the client library • A number of specific opBmizaons for MPI programs, especially for collecBve IO – Aggregators, data sieving, data type support • Probably the best overall interface, but only for MPI program Windows Client Direct Windows Apps Drive Mapping OrangeFS PVFS Protocol OrangeFS API Servers Dokan • Supports Windows 32/64 bit • Server 2008, R2, Vista, 7 Web Services Module • WebDAV • S3 • REST WebDAV Apache Server ModDAV DAVOrangeFS Server OrangeFS PVFS Protocol OrangeFS API Server • Supports DAV protocol and tested with (insert reference test run – check with mike) • Supports DAV cooperave locking in metadata S3 Apache Server ModS3 Server OrangeFS PVFS Protocol OrangeFS API Server Admin REST Interface DojoToolkit UI REST Server Apache Mod_OrangeFSREST Server OrangeFS PVFS Protocol OrangeFS API Server Important Features • Brief History • Architecture Overview • User Interfaces • Important Featuresß Over the hump! • Installaon/Configuraon Important Features • Already Touched On – Parallel Files (1.0) – Stateless, User-level Implementaon (2.0) – Distributed Metadata (2.0) – Extended Aiributes (2.0) – Configurable Parameters (2.0) – Distributed Directories (2.9) • Non-ConBguous I/O (1.0) • SSD Metadata Storage (2.8.5) • Replicate On Immutable (2.8.5) • Capability-based Security (2.9) Non-ConBguous I/O § NonconBguous I/O operaons are Noncontiguous in memory Memory common in computaonal science applicaons § Most PFSs available today File implement a POSIX-like interface Noncontiguous in file (open, write, close) Memory § POSIX nonconBguous support is poor: File • readv/writev only good for Noncontiguous in memory and file nonconguous in memory Memory • POSIX liso requires matching sizes in memory and file File § Beier interfaces allow for beier scalability SSD Metadata Storage Data Disk Server Metadata SSD • WriBng metadata to SSD – Improves Performance – Maintains Reliability Replicate On Immutable File I/O Client Server Server Metadata Data Data • First Step in Replicaon Roadmap • Replicate data to provide resiliency – IniBally replicate on Immutable – Client read fails over to replicated file if primary is unavailable Capability Based Security Cert or CredenBal OpenSSL Metadata PKI Signed Capability Servers Signed Capability I/O Client Server Signed Capability I/O Server Signed Capability I/O Server Future Features • Airibute Based Search • Hierarchical Data Management Airibute-Based Search Metadata Key/Value Parallel Query Metadata Client Data File Access Data • Client tags files with Keys/Values • Keys/Values indexed on Metadata Servers • Clients query for files based on Keys/Values • Returns file handles with opBons for filename and path Hierarchical Data Management Metadata OrangeFS Remote Systems TeraGrid, OSG, Lustre, GPFS Users Intermediate Archive Storage HPC NFS OrangeFS Installaon/Configuraon • Brief History • Architecture Overview • User Interfaces • Important Features • Installaon/Configuraonß Last One! Installing OrangeFS • Distributed as a tarball – You need • GNU Development environment – GCC, GNU Make, Autoconf, flex, bison • Berkeley DB V4 (4.8.32 or later) • Kernel Headers (for

Load more