Design and Implementation of an Open-Source Deduplication Platform for Research
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Achieving Superior Manageability, Efficiency, and Data Protection With
An Oracle White Paper December 2010 Achieving Superior Manageability, Efficiency, and Data Protection with Oracle’s Sun ZFS Storage Software Achieving Superior Manageability, Efficiency, and Data Protection with Oracle’s Sun ZFS Storage Software Introduction ......................................................................................... 2 Oracle’s Sun ZFS Storage Software ................................................... 3 Simplifying Storage Deployment and Management ............................ 3 Browser User Interface (BUI) ......................................................... 3 Built-in Networking and Security ..................................................... 4 Transparent Optimization with Hybrid Storage Pools ...................... 4 Shadow Data Migration ................................................................... 5 Third-party Confirmation of Management Efficiency ....................... 6 Improving Performance with Real-time Storage Profiling .................... 7 Increasing Storage Efficiency .............................................................. 8 Data Compression .......................................................................... 8 Data Deduplication .......................................................................... 9 Thin Provisioning ............................................................................ 9 Space-efficient Snapshots and Clones ......................................... 10 Reducing Risk with Industry-leading Data Protection ........................ 10 Self-Healing -
FY 2011 Annual Report
We're software roadies. Software Freedom Conservancy is a public charity that acts as a non-profit home for dozens of Free, Libre, and Open Source Software (FLOSS) projects. Conservancy©s charitable mission is to help improve, develop, and defend FLOSS, and we do that by providing business, legal, and administrative services to our member projects. We have the honor of working with member projects comprised of, in our humble opinion, many of the best software developers in the world. Some of our member projects develop system software so ubiquitous that it permeates virtually every part of our society©s electronics-driven lifestyle. Other member projects are redefining how software will be written and even how computer science will be taught to the next generation of developers. Still others find their niche by solving a small-but- persistent problem better than anyone else ± and attract a cult following of users because of it. And, best of all, all of our member projects release their software under a license that allows the public to study, use, improve, and share the source code. Conservancy provides all of our ªrock starº member projects with a comprehensive suite of services, and then we get out of their way to let them do what they do best: write great software for the public©s benefit. Our structure. Conservancy acts as a fiscal sponsor to our member projects. We©ve engaged the leadership of each member project©s developer community and executed a fiscal sponsorship agreement that allows us to adopt that project as an official part of Conservancy©s corporate structure. -
Linux Journal | August 2014 | Issue
™ SPONSORED BY Since 1994: The Original Magazine of the Linux Community AUGUST 2014 | ISSUE 244 | www.linuxjournal.com PROGRAMMING HOW-TO: + OpenGL Build, Develop Programming and Validate Creation of RPMs USE VAGRANT Sysadmin Cloud for an Easier Troubleshooting Development with dhclient Workflow Tips for PROMISE Becoming a THEORY Web Developer An In-Depth A Rundown Look of Linux for Recreation V WATCH: ISSUE OVERVIEW LJ244-Aug2014.indd 1 7/23/14 6:56 PM Get the automation platform that makes it easy to: Build Infrastructure Deploy Applications Manage In your data center or in the cloud. getchef.com LJ244-Aug2014.indd 2 7/23/14 11:41 AM Are you tiredtiered of of dealing dealing with with proprietary proprietary storage? storage? ® 9%2Ä4MHÆDCÄ2SNQ@FD ZFS Unified Storage zStax StorCore from Silicon - From modest data storage needs to a multi-tiered production storage environment, zStax StorCore zStax StorCore 64 zStax StorCore 104 The zStax StorCore 64 utilizes the latest in The zStax StorCore 104 is the flagship of the dual-processor Intel® Xeon® platforms and fast zStax product line. With its highly available SAS SSDs for caching. The zStax StorCore 64 configurations and scalable architecture, the platform is perfect for: zStax StorCore 104 platform is ideal for: VPDOOPHGLXPRIILFHILOHVHUYHUV EDFNHQGVWRUDJHIRUYLUWXDOL]HGHQYLURQPHQWV VWUHDPLQJYLGHRKRVWV PLVVLRQFULWLFDOGDWDEDVHDSSOLFDWLRQV VPDOOGDWDDUFKLYHV DOZD\VDYDLODEOHDFWLYHDUFKLYHV TalkTalk with with an anexpert expert today: today: 866-352-1173 866-352-1173 - http://www.siliconmechanics.com/zstax LJ244-Aug2014.indd 3 7/23/14 11:41 AM AUGUST 2014 CONTENTS ISSUE 244 PROGRAMMING FEATURES 64 Vagrant 74 An Introduction to How to use Vagrant to create a OpenGL Programming much easier development workflow. -
Study of File System Evolution
Study of File System Evolution Swaminathan Sundararaman, Sriram Subramanian Department of Computer Science University of Wisconsin {swami, srirams} @cs.wisc.edu Abstract File systems have traditionally been a major area of file systems are typically developed and maintained by research and development. This is evident from the several programmer across the globe. At any point in existence of over 50 file systems of varying popularity time, for a file system, there are three to six active in the current version of the Linux kernel. They developers, ten to fifteen patch contributors but a single represent a complex subsystem of the kernel, with each maintainer. These people communicate through file system employing different strategies for tackling individual file system mailing lists [14, 16, 18] various issues. Although there are many file systems in submitting proposals for new features, enhancements, Linux, there has been no prior work (to the best of our reporting bugs, submitting and reviewing patches for knowledge) on understanding how file systems evolve. known bugs. The problems with the open source We believe that such information would be useful to the development approach is that all communication is file system community allowing developers to learn buried in the mailing list archives and aren’t easily from previous experiences. accessible to others. As a result when new file systems are developed they do not leverage past experience and This paper looks at six file systems (Ext2, Ext3, Ext4, could end up re-inventing the wheel. To make things JFS, ReiserFS, and XFS) from a historical perspective worse, people could typically end up doing the same (between kernel versions 1.0 to 2.6) to get an insight on mistakes as done in other file systems. -
ECE 598 – Advanced Operating Systems Lecture 19
ECE 598 { Advanced Operating Systems Lecture 19 Vince Weaver http://web.eece.maine.edu/~vweaver [email protected] 7 April 2016 Announcements • Homework #7 was due • Homework #8 will be posted 1 Why use FAT over ext2? • FAT simpler, easy to code • FAT supported on all major OSes • ext2 faster, more robust filename and permissions 2 btrfs • B-tree fs (similar to a binary tree, but with pages full of leaves) • overwrite filesystem (overwite on modify) vs CoW • Copy on write. When write to a file, old data not overwritten. Since old data not over-written, crash recovery better Eventually old data garbage collected • Data in extents 3 • Copy-on-write • Forest of trees: { sub-volumes { extent-allocation { checksum tree { chunk device { reloc • On-line defragmentation • On-line volume growth 4 • Built-in RAID • Transparent compression • Snapshots • Checksums on data and meta-data • De-duplication • Cloning { can make an exact snapshot of file, copy-on- write different than link, different inodles but same blocks 5 Embedded • Designed to be small, simple, read-only? • romfs { 32 byte header (magic, size, checksum,name) { Repeating files (pointer to next [0 if none]), info, size, checksum, file name, file data • cramfs 6 ZFS Advanced OS from Sun/Oracle. Similar in idea to btrfs indirect still, not extent based? 7 ReFS Resilient FS, Microsoft's answer to brtfs and zfs 8 Networked File Systems • Allow a centralized file server to export a filesystem to multiple clients. • Provide file level access, not just raw blocks (NBD) • Clustered filesystems also exist, where multiple servers work in conjunction. -
Ext4 File System and Crash Consistency
1 Ext4 file system and crash consistency Changwoo Min 2 Summary of last lectures • Tools: building, exploring, and debugging Linux kernel • Core kernel infrastructure • Process management & scheduling • Interrupt & interrupt handler • Kernel synchronization • Memory management • Virtual file system • Page cache and page fault 3 Today: ext4 file system and crash consistency • File system in Linux kernel • Design considerations of a file system • History of file system • On-disk structure of Ext4 • File operations • Crash consistency 4 File system in Linux kernel User space application (ex: cp) User-space Syscalls: open, read, write, etc. Kernel-space VFS: Virtual File System Filesystems ext4 FAT32 JFFS2 Block layer Hardware Embedded Hard disk USB drive flash 5 What is a file system fundamentally? int main(int argc, char *argv[]) { int fd; char buffer[4096]; struct stat_buf; DIR *dir; struct dirent *entry; /* 1. Path name -> inode mapping */ fd = open("/home/lkp/hello.c" , O_RDONLY); /* 2. File offset -> disk block address mapping */ pread(fd, buffer, sizeof(buffer), 0); /* 3. File meta data operation */ fstat(fd, &stat_buf); printf("file size = %d\n", stat_buf.st_size); /* 4. Directory operation */ dir = opendir("/home"); entry = readdir(dir); printf("dir = %s\n", entry->d_name); return 0; } 6 Why do we care EXT4 file system? • Most widely-deployed file system • Default file system of major Linux distributions • File system used in Google data center • Default file system of Android kernel • Follows the traditional file system design 7 History of file system design 8 UFS (Unix File System) • The original UNIX file system • Design by Dennis Ritche and Ken Thompson (1974) • The first Linux file system (ext) and Minix FS has a similar layout 9 UFS (Unix File System) • Performance problem of UFS (and the first Linux file system) • Especially, long seek time between an inode and data block 10 FFS (Fast File System) • The file system of BSD UNIX • Designed by Marshall Kirk McKusick, et al. -
W4118: Linux File Systems
W4118: Linux file systems Instructor: Junfeng Yang References: Modern Operating Systems (3rd edition), Operating Systems Concepts (8th edition), previous W4118, and OS at MIT, Stanford, and UWisc File systems in Linux Linux Second Extended File System (Ext2) . What is the EXT2 on-disk layout? . What is the EXT2 directory structure? Linux Third Extended File System (Ext3) . What is the file system consistency problem? . How to solve the consistency problem using journaling? Virtual File System (VFS) . What is VFS? . What are the key data structures of Linux VFS? 1 Ext2 “Standard” Linux File System . Was the most commonly used before ext3 came out Uses FFS like layout . Each FS is composed of identical block groups . Allocation is designed to improve locality inodes contain pointers (32 bits) to blocks . Direct, Indirect, Double Indirect, Triple Indirect . Maximum file size: 4.1TB (4K Blocks) . Maximum file system size: 16TB (4K Blocks) On-disk structures defined in include/linux/ext2_fs.h 2 Ext2 Disk Layout Files in the same directory are stored in the same block group Files in different directories are spread among the block groups Picture from Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved. 0-13-6006639 3 Block Addressing in Ext2 Twelve “direct” blocks Data Data BlockData Inode Block Block BLKSIZE/4 Indirect Data Data Blocks BlockData Block Data (BLKSIZE/4)2 Indirect Block Data BlockData Blocks Block Double Block Indirect Indirect Blocks Data Data Data (BLKSIZE/4)3 BlockData Data Indirect Block BlockData Block Block Triple Double Blocks Block Indirect Indirect Data Indirect Data BlockData Blocks Block Block Picture from Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. -
Solid State Copier DP-250-BD-XL Or DVD
® SOLE SOURCE PROVIDER OF EVIDENCE GRADE PRODUCTS EVIDENCE - SURVEILLANCE - DIGITAL & ANALOG RECORDING PRODUCTS & EQUIPMENT Solid State Copier DP-250-BD-XL or DVD Flash to Flash, Flash to Optical and Optical to Flash Media copier designed for secure, high speed transfer of data without requiring a computer or network connectivity. Ruggedized design for both desktop and portable use. ✓ Perfect for copying Original Evidence - Stand Alone operation, No computer or network connection available or required. .No Edit functions ✓ Secure Transfer, perfect for data, pictures, video and audio ✓ High Speed Capability (16x BD/24x DVD/48x CD) ✓ Ruggedized Aluminum Case designed for Desktop and Portable operation. ✓ Two Models available: ✓ DP-250-DVD (DVD/CD) ✓ DP-250-BD-XL (Blu-Ray, BD-XL/DVD/CD) Key Features (DP-250) Stand Alone operation, no computer required, LCD display of current function selected. Secure Transfer o No Hard Drive or Memory retention of the copied or original data when powered off. o No edit functions, No connectivity to a Network or external computer. o Original date and time transferred to the copy. Supports most open format USB Devices. Supports all types of Flash Cards formatted in FAT16, FAT32, exFAT, ext2, ext3, ext4, HFS, HFS+. Burn Speed Settable. (16X BD/24X DVD/48X CD) Supports Flash to Flash, Flash to Disc and Disc to Flash Copying. Allows data appending to a USB/Flash device without erasing the existing data content. Supports Selective file copying, Multi-Session and Disc Spanning. (Flash to Optical) Copy, Copy & Compare, Copy Select & Multi-Session. (Flash to Optical) Rugged Aluminum case for Desktop and Portable use. -
Linux Journal | February 2016 | Issue
™ A LOOK AT KDE’s KStars Astronomy Program Since 1994: The Original Magazine of the Linux Community FEBRUARY 2016 | ISSUE 262 | www.linuxjournal.com + Programming Working with Command How-Tos Arguments in Your Program a Shell Scripts BeagleBone Interview: Katerina Black Barone-Adesi on to Help Brew Beer Developing the Snabb Switch Network Write a Toolkit Short Script to Solve a WATCH: ISSUE Math Puzzle OVERVIEW V LJ262-February2016.indd 1 1/21/16 5:26 PM NEW! Agile Improve Product Business Development Processes with an Enterprise Practical books Author: Ted Schmidt Job Scheduler for the most technical Sponsor: IBM Author: Mike Diehl Sponsor: people on the planet. Skybot Finding Your DIY Way: Mapping Commerce Site Your Network Author: to Improve Reuven M. Lerner Manageability GEEK GUIDES Sponsor: GeoTrust Author: Bill Childers Sponsor: InterMapper Combating Get in the Infrastructure Fast Lane Sprawl with NVMe Author: Author: Bill Childers Mike Diehl Sponsor: Sponsor: Puppet Labs Silicon Mechanics & Intel Download books for free with a Take Control Linux in simple one-time registration. of Growing the Time Redis NoSQL of Malware http://geekguide.linuxjournal.com Server Clusters Author: Author: Federico Kereki Reuven M. Lerner Sponsor: Sponsor: IBM Bit9 + Carbon Black LJ262-February2016.indd 2 1/21/16 5:26 PM NEW! Agile Improve Product Business Development Processes with an Enterprise Practical books Author: Ted Schmidt Job Scheduler for the most technical Sponsor: IBM Author: Mike Diehl Sponsor: people on the planet. Skybot Finding Your DIY Way: Mapping Commerce Site Your Network Author: to Improve Reuven M. Lerner Manageability GEEK GUIDES Sponsor: GeoTrust Author: Bill Childers Sponsor: InterMapper Combating Get in the Infrastructure Fast Lane Sprawl with NVMe Author: Author: Bill Childers Mike Diehl Sponsor: Sponsor: Puppet Labs Silicon Mechanics & Intel Download books for free with a Take Control Linux in simple one-time registration. -
Netbackup ™ Enterprise Server and Server 8.0 - 8.X.X OS Software Compatibility List Created on September 08, 2021
Veritas NetBackup ™ Enterprise Server and Server 8.0 - 8.x.x OS Software Compatibility List Created on September 08, 2021 Click here for the HTML version of this document. <https://download.veritas.com/resources/content/live/OSVC/100046000/100046611/en_US/nbu_80_scl.html> Copyright © 2021 Veritas Technologies LLC. All rights reserved. Veritas, the Veritas Logo, and NetBackup are trademarks or registered trademarks of Veritas Technologies LLC in the U.S. and other countries. Other names may be trademarks of their respective owners. Veritas NetBackup ™ Enterprise Server and Server 8.0 - 8.x.x OS Software Compatibility List 2021-09-08 Introduction This Software Compatibility List (SCL) document contains information for Veritas NetBackup 8.0 through 8.x.x. It covers NetBackup Server (which includes Enterprise Server and Server), Client, Bare Metal Restore (BMR), Clustered Master Server Compatibility and Storage Stacks, Deduplication, File System Compatibility, NetBackup OpsCenter, NetBackup Access Control (NBAC), SAN Media Server/SAN Client/FT Media Server, Virtual System Compatibility and NetBackup Self Service Support. It is divided into bookmarks on the left that can be expanded. IPV6 and Dual Stack environments are supported from NetBackup 8.1.1 onwards with few limitations, refer technote for additional information <http://www.veritas.com/docs/100041420> For information about certain NetBackup features, functionality, 3rd-party product integration, Veritas product integration, applications, databases, and OS platforms that Veritas intends to replace with newer and improved functionality, or in some cases, discontinue without replacement, please see the widget titled "NetBackup Future Platform and Feature Plans" at <https://sort.veritas.com/netbackup> Reference Article <https://www.veritas.com/docs/100040093> for links to all other NetBackup compatibility lists. -
Latency-Aware, Inline Data Deduplication for Primary Storage
iDedup: Latency-aware, inline data deduplication for primary storage Kiran Srinivasan, Tim Bisson, Garth Goodson, Kaladhar Voruganti NetApp, Inc. fskiran, tbisson, goodson, [email protected] Abstract systems exist that deduplicate inline with client requests for latency sensitive primary workloads. All prior dedu- Deduplication technologies are increasingly being de- plication work focuses on either: i) throughput sensitive ployed to reduce cost and increase space-efficiency in archival and backup systems [8, 9, 15, 21, 26, 39, 41]; corporate data centers. However, prior research has not or ii) latency sensitive primary systems that deduplicate applied deduplication techniques inline to the request data offline during idle time [1, 11, 16]; or iii) file sys- path for latency sensitive, primary workloads. This is tems with inline deduplication, but agnostic to perfor- primarily due to the extra latency these techniques intro- mance [3, 36]. This paper introduces two novel insights duce. Inherently, deduplicating data on disk causes frag- that enable latency-aware, inline, primary deduplication. mentation that increases seeks for subsequent sequential Many primary storage workloads (e.g., email, user di- reads of the same data, thus, increasing latency. In addi- rectories, databases) are currently unable to leverage the tion, deduplicating data requires extra disk IOs to access benefits of deduplication, due to the associated latency on-disk deduplication metadata. In this paper, we pro- costs. Since offline deduplication systems impact la- -
The Linux Device File-System
The Linux Device File-System Richard Gooch EMC Corporation [email protected] Abstract 1 Introduction All Unix systems provide access to hardware via de- vice drivers. These drivers need to provide entry points for user-space applications and system tools to access the hardware. Following the \everything is a file” philosophy of Unix, these entry points are ex- posed in the file name-space, and are called \device The Device File-System (devfs) provides a power- special files” or \device nodes". ful new device management mechanism for Linux. Unlike other existing and proposed device manage- This paper discusses how these device nodes are cre- ment schemes, it is powerful, flexible, scalable and ated and managed in conventional Unix systems and efficient. the limitations this scheme imposes. An alternative mechanism is then presented. It is an alternative to conventional disc-based char- acter and block special devices. Kernel device drivers can register devices by name rather than de- vice numbers, and these device entries will appear in the file-system automatically. 1.1 Device numbers Devfs provides an immediate benefit to system ad- ministrators, as it implements a device naming scheme which is more convenient for large systems Conventional Unix systems have the concept of a (providing a topology-based name-space) and small \device number". Each instance of a driver and systems (via a device-class based name-space) alike. hardware component is assigned a unique device number. Within the kernel, this device number is Device driver authors can benefit from devfs by used to refer to the hardware and driver instance.