Manual Pages (Man) for OCFS2 1.8 Release

Manual Pages (Man) for OCFS2 1.8 Release

OCFS2(7) OCFS2 Manual Pages OCFS2(7) NAME OCFS2 − A Shared-Disk Cluster File System for Linux INTRODUCTION OCFS2 is a file system.Itallows users to store and retrieve data. The data is stored in files that are orga- nized in a hierarchical directory tree. It is a POSIX compliant file system that supports the standard inter- faces and the behavioral semantics as spelled out by that specification. It is also a shared disk cluster file system, one that allows multiple nodes to access the same disk at the same time. This is where the fun begins as allowing a file system to be accessible on multiple nodes opens a can of worms. What if the nodes are of different architectures? What if a node dies while writing to the file system? What data consistencycan one expect if processes on twonodes are reading and writing concur- rently? What if one node removesafile while it is still being used on another node? Unlikemost shared file systems where the answer is fuzzy,the answer in OCFS2 is very well defined. It behavesonall nodes exactly likealocal file system. If a file is removed, the directory entry is removedbut the inode is kept as long as it is in use across the cluster.When the last user closes the descriptor,the inode is marked for deletion. The data consistencymodel follows the same principle. It works as if the twoprocesses that are running on twodifferent nodes are running on the same node. A read on a node gets the last write irrespective ofthe IO mode used. The modes can be buffered, direct, asynchronous, splice or memory mapped IOs. It is fully cache coherent. Take for example the REFLINK feature that allows a user to create multiple write-able snapshots of a file. This feature, likeall others, is fully cluster-aware. A file being written to on multiple nodes can be safely reflinked on another.The snapshot created is a point-in-time image of the file that includes both the file data and all its attributes (including extended attributes). It is a journaling file system. When a node dies, a surviving node transparently replays the journal of the dead node. This ensures that the file system metadata is always consistent. It also defaults to ordered data journaling to ensure the file data is flushed to disk before the journal commit, to remove the small possibil- ity of stale data appearing in files after a crash. It is architecture and endian neutral.Itallows concurrent mounts on nodes with different processors like x86, x86_64, IA64 and PPC64. It handles little and big endian, 32-bit and 64-bit architectures. It is featurerich.Itsupports indexed directories, metadata checksums, extended attributes, POSIX ACLs, quotas, REFLINKs, sparse files, unwritten extents and inline-data. It is fully integrated with the mainline Linux kernel. The file system was merged into Linux kernel 2.6.16 in early 2006. It is quickly installed.Itisavailable with almost all Linux distributions. The file system is on-disk com- patible across all of them. It is modular.The file system can be configured to operate with other cluster stacks like Pacemaker and CMAN along with its own stack, O2CB. It is easily configured.The O2CB cluster stack configuration involves editing twofiles, one for cluster lay- out and the other for cluster timeouts. It is very efficient.The file system consumes very little resources. It is used to store virtual machine images in limited memory environments likeXen and KVM. Version 1.8.2 January 2012 1 OCFS2(7) OCFS2 Manual Pages OCFS2(7) In summary,OCFS2 is an efficient, easily configured, modular,quickly installed, fully integrated and com- patible, feature-rich, architecture and endian neutral, cache coherent, ordered data journaling, POSIX-com- pliant, shared disk cluster file system. OVERVIEW OCFS2 is a general-purpose shared-disk cluster file system for Linux capable of providing both high per- formance and high availability. As it provides local file system semantics, it can be used with almost all applications. Cluster-aware appli- cations can makeuse of cache-coherent parallel I/Os from multiple nodes to scale out applications easily. Other applications can makeuse of the clustering facilities to fail-overrunning application in the event of a node failure. The notable features of the file system are: Tunable Block size The file system supports block sizes of 512, 1K, 2K and 4K bytes. 4KB is almost always recom- mended. This feature is available in all releases of the file system. Tunable Cluster size Acluster size is also referred to as an allocation unit. The file system supports cluster sizes of 4K, 8K, 16K, 32K, 64K, 128K, 256K, 512K and 1M bytes. For most use cases, 4KB is recommended. However, a larger value is recommended for volumes hosting mostly very large files likedatabase files, virtual machine images, etc. A large cluster size allows the file system to store large files more efficiently.This feature is available in all releases of the file system. Endian and Architectureneutral The file system can be mounted concurrently on nodes having different architectures. Like32-bit, 64-bit, little-endian (x86, x86_64, ia64) and big-endian (ppc64, s390x). This feature is available in all releases of the file system. Buffered, Direct, Asynchronous, Splice and Memory Mapped I/O modes The file system supports all modes of I/O for maximum flexibility and performance. It also sup- ports cluster-wide shared writeable mmap(2).The support for bufferred, direct and asynchronous I/O is available in all releases. The support for splice I/O was added in Linux kernel 2.6.20 and for shared writeable map(2) in 2.6.23. Multiple Cluster Stacks The file system includes a flexible framework to allowittofunction with userspace cluster stacks likePacemaker (pcmk)and CMAN (cman), its own in-kernel cluster stack o2cb and no cluster stack. The support for o2cb cluster stack is available in all releases. The support for no cluster stack, or local mount, was added in Linux kernel 2.6.20. The support for userspace cluster stack was added in Linux kernel 2.6.26. Journaling The file system supports both ordered (default) and writeback data journaling modes to provide file system consistencyinthe event of power failure or system crash. It uses JBD2 in Linux kernel 2.6.28 and later.Itused JBD in earlier kernels. Version 1.8.2 January 2012 2 OCFS2(7) OCFS2 Manual Pages OCFS2(7) Extent-based Allocations The file system allocates and tracks space in ranges of clusters. This is unlikeblock based file sys- tems that have totrack each and every block. This feature allows the file system to be very effi- cient when dealing with both large volumes and large files. This feature is available in all releases of the file system. Sparse files Sparse files are files with holes. With this feature, the file system delays allocating space until a write is issued to a cluster.This feature was added in Linux kernel 2.6.22 and requires enabling on-disk feature sparse. Unwritten Extents An unwritten extent is also referred to as user pre-allocation. It allows an application to request a range of clusters to be allocated, but not initialized, within a file. Pre-allocation allows the file sys- tem to optimize the data layout with fewer,larger extents. It also provides a performance boost, delaying initialization until the user writes to the clusters. This feature was added in Linux kernel 2.6.23 and requires enabling on-disk feature unwritten. Hole Punching Hole punching allows an application to remove arbitrary allocated regions within a file. Creating holes, essentially.This is more efficient than zeroing the same extents. This feature is especially useful in virtualized environments as it allows a block discard in a guest file system to be con- verted to a hole punch in the host file system thus allowing users to reduce disk space usage. This feature was added in Linux kernel 2.6.23 and requires enabling on-disk features sparse and unwritten. Inline-data Inline data is also referred to as data-in-inode as it allows storing small files and directories in the inode block. This not only savesspace but also has a positive impact on cold-cache directory and file operations. The data is transparently movedout to an extent when it no longer fits inside the inode block. This feature was added in Linux kernel 2.6.24 and requires enabling on-disk feature inline-data. REFLINK REFLINK is also referred to as fast copy. Itallows users to atomically (and instantly) copyregular files. In other words, create multiple writeable snapshots of regular files. It is called REFLINK because it looks and feels more likea(hard) link(2) than a traditional snapshot. Likea link, it is a regular user operation, subject to the security attributes of the inode being reflinked and not to the super user privileges typically required to create a snapshot. Likealink, it operates within a file system. But unlikealink, it links the inodes at the data extent levelallowing each reflinked inode to growindependently as and when written to. Up to four billion inodes can share a data extent. This feature was added in Linux kernel 2.6.32 and requires enabling on-disk feature refcount. Allocation Reservation File contiguity plays an important role in file system performance. When a file is fragmented on disk, reading and writing to the file involves manyseeks, leading to lower throughput. Contiguous files, on the other hand, minimize seeks, allowing the disks to perform IO at the maximum rate. With allocation reservation, the file system reserves a windowinthe bitmap for all extending files allowing each to growascontiguously as possible.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    84 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us