One Big PDF Volume
Total Page:16
File Type:pdf, Size:1020Kb
Proceedings of the Linux Symposium July 14–16th, 2014 Ottawa, Ontario Canada Contents Btr-Diff: An Innovative Approach to Differentiate BtrFs Snapshots 7 N. Mandliwala, S. Pimpale, N. P. Singh, G. Phatangare Leveraging MPST in Linux with Application Guidance to Achieve Power and Performance Goals 13 M.R. Jantz, K.A. Doshi, P.A. Kulkarni, H. Yun CPU Time Jitter Based Non-Physical True Random Number Generator 23 S. Müller Policy-extendable LMK filter framework for embedded system 49 K. Baik, J. Kim, D. Kim Scalable Tools for Non-Intrusive Performance Debugging of Parallel Linux Workloads 63 R. Schöne, J. Schuchart, T. Ilsche, D. Hackenberg Veloces: An Efficient I/O Scheduler for Solid State Devices 77 V.R. Damle, A.N. Palnitkar, S.D. Rangnekar, O.D. Pawar, S.A. Pimpale, N.O. Mandliwala Dmdedup: Device Mapper Target for Data Deduplication 83 V. Tarasov, D. Jain, G. Kuenning, S. Mandal, K. Palanisami, P. Shilane, S. Trehan, E. Zadok SkyPat: C++ Performance Analysis and Testing Framework 97 P.H. Chang, K.H. Kuo, D.Y. Tsai, K. Chen, Luba W.L. Tang Computationally Efficient Multiplexing of Events on Hardware Counters 101 R.V. Lim, D. Carrillo-Cisneros, W. Alkowaileet, I.D. Scherson The maxwell(8) random number generator 111 S. Harris Popcorn: a replicated-kernel OS based on Linux 123 A. Barbalace, B. Ravindran, D. Katz Conference Organizers Andrew J. Hutton, Linux Symposium Emilie Moreau, Linux Symposium Proceedings Committee Ralph Siemsen With thanks to John W. Lockhart, Red Hat Robyn Bergeron Authors retain copyright to all submitted papers, but have granted unlimited redistribution rights to all as a condition of submission. Btr-Diff: An Innovative Approach to Differentiate BtrFs Snapshots Nafisa Mandliwala Swapnil Pimpale Pune Institute of Computer Technology PICT LUG [email protected] [email protected] Narendra Pal Singh Ganesh Phatangare Pune Institute of Computer Technology PICT LUG [email protected] [email protected] Abstract 1 Introduction Efficient storage and fast retrieval of data has always In current times, where data is critical to every organi- been of utmost importance. The BtrFs file system is zation, its appropriate storage and management is what a copy-on-write (COW) based B-tree file system that brings in value. Our approach aims at taking advantage has an built-in support for snapshots and is considered of the BtrFs architectural features that are made avail- a potential replacement for the EXT4 file system. It able by design. This facilitates speedy tree traversal to is designed specifically to address the need to scale, detect changes in snapshots. The send ioctl runs the tree manage and administer large storage configurations of comparison algorithm in kernel space using the on-disk Linux systems. Snapshots are useful to have local on- metadata format (rather than the abstract stat format line “copies” of the file system that can be referred back exported to the user space), which includes the ability to to, or to implement a form of deduplication, or for taking recognize when entire sub-trees can be skipped for com- a full backup of the file system. The ability to compare parison. Since the whole comparison algorithm runs in these snapshots becomes crucial for system administra- kernel space, the algorithm is clearly superior over exist- tors as well as end users. ing user space snapshot management tools such as Snap- 1 The existing snapshot management tools perform direc- per . tory based comparison on block level in user space. This Snapper uses diff algorithm with a few more optimiza- approach is generic and is not suitable for B-tree based tions to avoid comparing files that have not changed. file systems that are designed to cater to large storage. This approach requires all of the metadata for the two Simply traversing the directory structure is slow and trees being compared to be read. The most I/O inten- only gets slower as the file system grows. With the sive part is not comparing the files but generating the BtrFs send/receive mechanism, the filesystem can be in- list of changed files. It needs to list all the files in the structed to calculate the set of changes made between tree and stat them to see if they have changed between the two snapshots and serialize them to a file. the snapshots. The performance of such an algorithm Our objective is to leverage the send part in the kernel degrades drastically as changes to the file system grow. to implement a new mechanism to list all the files that This is mainly caused because Snapper deploys an al- have been added, removed, changed or had their meta- gorithm that is not specifically designed to run on COW data changed in some way. The proposed solution takes based file systems. advantage of the BtrFs B-tree structure and its power- ful snapshot capabilities to speed up the tree traversal The rest of the paper is organized as follows: Section2 and detect changes in snapshots based on inode values. explains the BtrFs internal architecture, associated data In addition, our tool can also detect changes between a structures, and features. Working of the send-receive snapshot and an explicitly mentioned parent. This lends code and the diff commands used, are discussed in itself for daily incremental backups of the file system, 1The description takes into consideration, the older version of and can very easily be integrated with existing snapshot Snapper. The newer version, however, does use the send-receive management tools. code for diff generation • 7 • 8 • Btr-Diff: An Innovative Approach to Differentiate BtrFs Snapshots Figure 1: The Terminal Node Structure (src: BtrFs De- sign Wiki) Figure 2: BtrFs Data Structures (src: BtrFs Design Wiki) Section3. Section4 covers the design of the pro- posed solution, the algorithm used and its implementa- tion. This is followed by benchmarking and its analysis is present in the tree etc. The key, which defines the in Section5. Section6 states applications of the send- order in the tree, has the fields: objectid, type and receive code and diff generation. Section7 lists the offset. Each subvolume has its own set of object ids. possible feature additions to Btr-diff. Finally, Section8 The type field contains the kind of item it is, of which, summarizes the conclusions of the paper and is followed the prominent ones are inode_item, inode_ref, xattr_ by references. item, orphan_item, dir_log_item, dir_item, dir_ index, extent_data, root_item, root_ref, extent_ item, extent_data_ref, dev_item, chunk_item etc. 2 BtrFs File System The offset field is dependent on the kind of item. The Btrfs file system is scalable to a maximum file/file A typical leaf node consists of several items. offset and system size up to 16 exabytes. Although it exposes a size tell us where to find the item in the leaf (relative to plethora of features w.r.t. scalability, data integrity, data the start of the data area). The internal nodes contain compression, SSD optimizations etc, this paper focuses [key,blockpointer] pairs whereas the leaf nodes con- only on the ones relevant to snapshot storage: compari- tain a block header, array of fix sized items and the data son and retrieval. field. Items and data grow towards each other. Typical data structures are shown in Figure2. The file system uses the ‘copy-on-write’ B-tree as its generic data structure. B-trees are used as they pro- vide logarithmic time for common operations like inser- 2.2 Subvolumes and Snapshots tion, deletion, sequential access and search. This COW- friendly B-tree in which the leaf node linkages are ab- sent was originally proposed by Ohad Rodeh. In such The BtrFs subvolume is a named B-tree that holds files trees, writes are never made in-place, instead, the modi- and directories and is the smallest unit that can be snap- fied data is written to a different location, and the corre- shotted. A point-in-time copy of such a subvolume is sponding metadata is updated. BtrFs, thus, has built-in called a snapshot. A reference count is associated with support for snapshots, which are point-in-time copies of every subvolume which is incremented on snapshot cre- entire subvolumes, and rollbacks. ation. A snapshot stores the state of a subvolume and can be created almost instantly using the following com- mand, 2.1 Data Structures btrfs subvolume snapshot [-r] <source> [<dest>/]<name> As seen in Figure1, the BtrFs architecture consists of three data structures internally; namely blockheader, Such a snapshot occupies no disk space at creation. key and item. The blockheader contains checksums, Snapshots can be used to backup subvolumes that can file system specific uuid, the level at which the block be used later for restore or recovery operations. 2014 Linux Symposium • 9 3 Send-Receive Code subvolume ID corresponding to the snapshot of inter- est. It then calculates changes between the two given Snapshots are primarily used for data backup and recov- snapshots. The command sends the subvolume speci- ery. Considering the size of file system wide snapshots, fied by <subvol> to stdout. By default, this will send detecting how the file system has changed between two the whole subvolume. The following are some more op- given snapshots, manually, is a tedious task. Thus devel- tions for output generation: oping a clean mechanism to showcase differences be- tween two snapshots becomes extremely important in • The operation can take a list of snapshot / subvol- snapshot management. ume IDs and generate a combined file for all of The diff command is used for differentiating any them. The parent snapshot can be specified explic- two files, but it uses text based2 search which is time itly.