A Five-Year Study of File-System Metadata
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
A Large-Scale Study of File-System Contents
A Large-Scale Study of File-System Contents John R. Douceur and William J. Bolosky Microsoft Research Redmond, WA 98052 {johndo, bolosky}@microsoft.com ABSTRACT sizes are fairly consistent across file systems, but file lifetimes and file-name extensions vary with the job function of the user. We We collect and analyze a snapshot of data from 10,568 file also found that file-name extension is a good predictor of file size systems of 4801 Windows personal computers in a commercial but a poor predictor of file age or lifetime, that most large files are environment. The file systems contain 140 million files totaling composed of records sized in powers of two, and that file systems 10.5 TB of data. We develop analytical approximations for are only half full on average. distributions of file size, file age, file functional lifetime, directory File-system designers require usage data to test hypotheses [8, size, and directory depth, and we compare them to previously 10], to drive simulations [6, 15, 17, 29], to validate benchmarks derived distributions. We find that file and directory sizes are [33], and to stimulate insights that inspire new features [22]. File- fairly consistent across file systems, but file lifetimes vary widely system access requirements have been quantified by a number of and are significantly affected by the job function of the user. empirical studies of dynamic trace data [e.g. 1, 3, 7, 8, 10, 14, 23, Larger files tend to be composed of blocks sized in powers of two, 24, 26]. However, the details of applications’ and users’ storage which noticeably affects their size distribution. -
Dealing with Document Size Limits
Dealing with Document Size Limits Introduction The Electronic Case Filing system will not accept PDF documents larger than ten megabytes (MB). If the document size is less than 10 MB, it can be filed electronically just as it is. If it is larger than 10 MB, it will need to be divided into two or more documents, with each document being less than 10 MB. Word Processing Documents Documents created with a word processing program (such as WordPerfect or Microsoft Word) and correctly converted to PDF will generally be smaller than a scanned document. Because of variances in software, usage, and content, it is difficult to estimate the number of pages that would constitute 10 MB. (Note: See “Verifying File Size” below and for larger documents, see “Splitting PDF Documents into Multiple Documents” below.) Scanned Documents Although the judges’ Filing Preferences indicate a preference for conversion of documents rather than scanning, it will be necessary to scan some documents for filing, e.g., evidentiary attachments must be scanned. Here are some things to remember: • Documents scanned to PDF are generally much larger than those converted through a word processor. • While embedded fonts may be necessary for special situations, e.g., trademark, they will increase the file size. • If graphs or color photos are included, just a few pages can easily exceed the 10 MB limit. Here are some guidelines: • The court’s standard scanner resolution is 300 dots per inch (DPI). Avoid using higher resolutions as this will create much larger file sizes. • Normally, the output should be set to black and white. -
Latency-Aware, Inline Data Deduplication for Primary Storage
iDedup: Latency-aware, inline data deduplication for primary storage Kiran Srinivasan, Tim Bisson, Garth Goodson, Kaladhar Voruganti NetApp, Inc. fskiran, tbisson, goodson, [email protected] Abstract systems exist that deduplicate inline with client requests for latency sensitive primary workloads. All prior dedu- Deduplication technologies are increasingly being de- plication work focuses on either: i) throughput sensitive ployed to reduce cost and increase space-efficiency in archival and backup systems [8, 9, 15, 21, 26, 39, 41]; corporate data centers. However, prior research has not or ii) latency sensitive primary systems that deduplicate applied deduplication techniques inline to the request data offline during idle time [1, 11, 16]; or iii) file sys- path for latency sensitive, primary workloads. This is tems with inline deduplication, but agnostic to perfor- primarily due to the extra latency these techniques intro- mance [3, 36]. This paper introduces two novel insights duce. Inherently, deduplicating data on disk causes frag- that enable latency-aware, inline, primary deduplication. mentation that increases seeks for subsequent sequential Many primary storage workloads (e.g., email, user di- reads of the same data, thus, increasing latency. In addi- rectories, databases) are currently unable to leverage the tion, deduplicating data requires extra disk IOs to access benefits of deduplication, due to the associated latency on-disk deduplication metadata. In this paper, we pro- costs. Since offline deduplication systems impact la- -
On the Performance Variation in Modern Storage Stacks
On the Performance Variation in Modern Storage Stacks Zhen Cao1, Vasily Tarasov2, Hari Prasath Raman1, Dean Hildebrand2, and Erez Zadok1 1Stony Brook University and 2IBM Research—Almaden Appears in the proceedings of the 15th USENIX Conference on File and Storage Technologies (FAST’17) Abstract tions on different machines have to compete for heavily shared resources, such as network switches [9]. Ensuring stable performance for storage stacks is im- In this paper we focus on characterizing and analyz- portant, especially with the growth in popularity of ing performance variations arising from benchmarking hosted services where customers expect QoS guaran- a typical modern storage stack that consists of a file tees. The same requirement arises from benchmarking system, a block layer, and storage hardware. Storage settings as well. One would expect that repeated, care- stacks have been proven to be a critical contributor to fully controlled experiments might yield nearly identi- performance variation [18, 33, 40]. Furthermore, among cal performance results—but we found otherwise. We all system components, the storage stack is the corner- therefore undertook a study to characterize the amount stone of data-intensive applications, which become in- of variability in benchmarking modern storage stacks. In creasingly more important in the big data era [8, 21]. this paper we report on the techniques used and the re- Although our main focus here is reporting and analyz- sults of this study. We conducted many experiments us- ing the variations in benchmarking processes, we believe ing several popular workloads, file systems, and storage that our observations pave the way for understanding sta- devices—and varied many parameters across the entire bility issues in production systems. -
Filesystems HOWTO Filesystems HOWTO Table of Contents Filesystems HOWTO
Filesystems HOWTO Filesystems HOWTO Table of Contents Filesystems HOWTO..........................................................................................................................................1 Martin Hinner < [email protected]>, http://martin.hinner.info............................................................1 1. Introduction..........................................................................................................................................1 2. Volumes...............................................................................................................................................1 3. DOS FAT 12/16/32, VFAT.................................................................................................................2 4. High Performance FileSystem (HPFS)................................................................................................2 5. New Technology FileSystem (NTFS).................................................................................................2 6. Extended filesystems (Ext, Ext2, Ext3)...............................................................................................2 7. Macintosh Hierarchical Filesystem − HFS..........................................................................................3 8. ISO 9660 − CD−ROM filesystem.......................................................................................................3 9. Other filesystems.................................................................................................................................3 -
If You Have Attempted to Upload Your Files and You Receive an Error Or
If you have attempted to upload your files and you receive an error or they simply will not upload to EDJOIN, your files are probably either in an inappropriate format, or too large. Please ensure that your document meets the appropriate format and size requirements below. Acceptable format: .PDF Size limit: Each file must not exceed 1 MB (megabyte) or 1024 KB File Name: The file name must contain less than 50 characters including spaces. If the file name contains more than 50 characters or any special characters, you may have trouble attaching your documents and/or the district you are applying to may have trouble viewing them. Please make sure your document title only contains letters and numbers. If the document is multiple pages, you may need to scan multiple sections of the document to maintain the 1MB file size allowance. Saving Your Documents to .PDF Format: If Using A PC: Microsoft Word 2007 or later allows you to save a document as .PDF on a PC 1. Open the document from where you have it saved in your computer. 2. Once the document opens in Microsoft Word, go to the File menu at the top left of your Microsoft Word window and click on Save as. 3. Note where you will be saving the document, go to the bottom of the window and select PDF (*.pdf) in the Save as type box. 4. Save and close. If Using A Mac: Using the print dialog box allows you to save just about any document to .PDF format from a Mac 1. -
Sequence Alignment/Map Format Specification
Sequence Alignment/Map Format Specification The SAM/BAM Format Specification Working Group 3 Jun 2021 The master version of this document can be found at https://github.com/samtools/hts-specs. This printing is version 53752fa from that repository, last modified on the date shown above. 1 The SAM Format Specification SAM stands for Sequence Alignment/Map format. It is a TAB-delimited text format consisting of a header section, which is optional, and an alignment section. If present, the header must be prior to the alignments. Header lines start with `@', while alignment lines do not. Each alignment line has 11 mandatory fields for essential alignment information such as mapping position, and variable number of optional fields for flexible or aligner specific information. This specification is for version 1.6 of the SAM and BAM formats. Each SAM and BAMfilemay optionally specify the version being used via the @HD VN tag. For full version history see Appendix B. Unless explicitly specified elsewhere, all fields are encoded using 7-bit US-ASCII 1 in using the POSIX / C locale. Regular expressions listed use the POSIX / IEEE Std 1003.1 extended syntax. 1.1 An example Suppose we have the following alignment with bases in lowercase clipped from the alignment. Read r001/1 and r001/2 constitute a read pair; r003 is a chimeric read; r004 represents a split alignment. Coor 12345678901234 5678901234567890123456789012345 ref AGCATGTTAGATAA**GATAGCTGTGCTAGTAGGCAGTCAGCGCCAT +r001/1 TTAGATAAAGGATA*CTG +r002 aaaAGATAA*GGATA +r003 gcctaAGCTAA +r004 ATAGCT..............TCAGC -r003 ttagctTAGGC -r001/2 CAGCGGCAT The corresponding SAM format is:2 1Charset ANSI X3.4-1968 as defined in RFC1345. -
CD ROM Drives and Their Handling Under RISC OS Explained
28th April 1995 Support Group Application Note Number: 273 Issue: 0.4 **DRAFT** Author: DW CD ROM Drives and their Handling under RISC OS Explained This Application Note details the interfacing of, and protocols underlying, CD ROM drives and their handling under RISC OS. Applicable Related Hardware : Application All Acorn computers running Notes: 266: Developing CD ROM RISC OS 3.1 or greater products for Acorn machines 269: File transfer between Acorn and foreign platforms Every effort has been made to ensure that the information in this leaflet is true and correct at the time of printing. However, the products described in this leaflet are subject to continuous development and Support Group improvements and Acorn Computers Limited reserves the right to change its specifications at any time. Acorn Computers Limited Acorn Computers Limited cannot accept liability for any loss or damage arising from the use of any information or particulars in this leaflet. Acorn, the Acorn Logo, Acorn Risc PC, ECONET, AUN, Pocket Acorn House Book and ARCHIMEDES are trademarks of Acorn Computers Limited. Vision Park ARM is a trademark of Advance RISC Machines Limited. Histon, Cambridge All other trademarks acknowledged. ©1995 Acorn Computers Limited. All rights reserved. CB4 4AE Support Group Application Note No. 273, Issue 0.4 **DRAFT** 28th April 1995 Table of Contents Introduction 3 Interfacing CD ROM Drives 3 Features of CD ROM Drives 4 CD ROM Formatting Standards ("Coloured Books") 5 CD ROM Data Storage Standards 7 CDFS and CD ROM Drivers 8 Accessing Data on CD ROMs Produced for Non-Acorn Platforms 11 Troubleshooting 12 Appendix A: Useful Contacts 13 Appendix B: PhotoCD Mastering Bureaux 14 Appendix C: References 14 Support Group Application Note No. -
Mac OS X Server
Mac OS X Server Version 10.4 Technology Overview August 2006 Technology Overview 2 Mac OS X Server Contents Page 3 Introduction Page 5 New in Version 10.4 Page 7 Operating System Fundamentals UNIX-Based Foundation 64-Bit Computing Advanced BSD Networking Architecture Robust Security Directory Integration High Availability Page 10 Integrated Management Tools Server Admin Workgroup Manager Page 14 Service Deployment and Administration Open Directory Server File and Print Services Mail Services Web Hosting Enterprise Applications Media Streaming iChat Server Software Update Server NetBoot and NetInstall Networking and VPN Distributed Computing Page 29 Product Details Page 31 Open Source Projects Page 35 Additional Resources Technology Overview 3 Mac OS X Server Introduction Mac OS X Server version 10.4 Tiger gives you everything you need to manage servers in a mixed-platform environment and to con gure, deploy, and manage powerful network services. Featuring the renowned Mac OS X interface, Mac OS X Server streamlines your management tasks with applications and utilities that are robust yet easy to use. Apple’s award-winning server software brings people and data together in innovative ways. Whether you want to empower users with instant messaging and blogging, gain greater control over email, reduce the cost and hassle of updating software, or build your own distributed supercomputer, Mac OS X Server v10.4 has the tools you need. The Universal release of Mac OS X Server runs on both Intel- and PowerPC-based The power and simplicity of Mac OS X Server are a re ection of Apple’s operating sys- Mac desktop and Xserve systems. -
Converting Audio – Audio File Size
Converting Audio – Audio File Size By the end of this worksheet you should: • know how to calculate the possible file size of an audio recording You should now be very familiar with the following concepts: • BIT: a single zero (0) or (1), short for Binary digIT • BYTE: 8 bits • Analogue: a continuously varying signal • Digital: a pulse that alternates between OFF (0) and ON (1) • ADC: Analogue to Digital Converter • Sample Rate: how many times per second the ADC tests the incoming analogue signal • Sample Resolution: the number of bits allocate to each sample taken We only need to add 1 more snippet of knowledge to calculate the possible file size of an audio recording, and that is the duration of the recording. Imagine that you record 10 seconds of audio, setting the sample rate to 44.1kHz and using 16 bits per sample. What does this all mean? Page 1 of 5 Converting Audio – Audio File Size Well, we know that… • each sample uses 16 bits • there are 44 100 samples per second • this means that we are using 44 100 samples per second x 16 bits = 705 600 bits per second • Since we recorded for 10 seconds we have 705 600 bits per second x 10 seconds = 7 056 000 bits in total • There are 8 bits to a byte so 7 056 000 ÷ 8 = 882 000 bytes • 1 000 is a kilo so 882 000 bytes = 882 kilobytes or 882KB So, a 10 second recording that is set at 16 bits/44.1kHz might result in a file of 882KB. -
Porting FUSE to L4re
Großer Beleg Porting FUSE to L4Re Florian Pester 23. Mai 2013 Technische Universit¨at Dresden Fakult¨at Informatik Institut fur¨ Systemarchitektur Professur Betriebssysteme Betreuender Hochschullehrer: Prof. Dr. rer. nat. Hermann H¨artig Betreuender Mitarbeiter: Dipl.-Inf. Carsten Weinhold Erkl¨arung Hiermit erkl¨are ich, dass ich diese Arbeit selbstst¨andig erstellt und keine anderen als die angegebenen Hilfsmittel benutzt habe. Declaration I hereby declare that this thesis is a work of my own, and that only cited sources have been used. Dresden, den 23. Mai 2013 Florian Pester Contents 1. Introduction 1 2. State of the Art 3 2.1. FUSE on Linux . .3 2.1.1. FUSE Internal Communication . .4 2.2. The L4Re Virtual File System . .5 2.3. Libfs . .5 2.4. Communication and Access Control in L4Re . .6 2.5. Related Work . .6 2.5.1. FUSE . .7 2.5.2. Pass-to-Userspace Framework Filesystem . .7 3. Design 9 3.1. FUSE Server parts . 11 4. Implementation 13 4.1. Example Request handling . 13 4.2. FUSE Server . 14 4.2.1. LibfsServer . 14 4.2.2. Translator . 14 4.2.3. Requests . 15 4.2.4. RequestProvider . 15 4.2.5. Node Caching . 15 4.3. Changes to the FUSE library . 16 4.4. Libfs . 16 4.5. Block Device Server . 17 4.6. File systems . 17 5. Evaluation 19 6. Conclusion and Further Work 25 A. FUSE operations 27 B. FUSE library changes 35 C. Glossary 37 V List of Figures 2.1. The architecture of FUSE on Linux . .3 2.2. The architecture of libfs . -
Paratrac: a Fine-Grained Profiler for Data-Intensive Workflows
ParaTrac: A Fine-Grained Profiler for Data-Intensive Workflows Nan Dun Kenjiro Taura Akinori Yonezawa Department of Computer Department of Information and Department of Computer Science Communication Engineering Science The University of Tokyo The University of Tokyo The University of Tokyo 7-3-1 Hongo, Bunkyo-Ku 7-3-1 Hongo, Bunkyo-Ku 7-3-1 Hongo, Bunkyo-Ku Tokyo, 113-5686 Japan Tokyo, 113-5686 Japan Tokyo, 113-5686 Japan [email protected] [email protected] [email protected] tokyo.ac.jp tokyo.ac.jp tokyo.ac.jp ABSTRACT 1. INTRODUCTION The realistic characteristics of data-intensive workflows are With the advance of high performance distributed com- critical to optimal workflow orchestration and profiling is puting, users are able to execute various data-intensive an effective approach to investigate the behaviors of such applications by harnessing massive computing resources [1]. complex applications. ParaTrac is a fine-grained profiler Though workflow management systems have been developed for data-intensive workflows by using user-level file system to alleviate the difficulties of planning, scheduling, and exe- and process tracing techniques. First, ParaTrac enables cuting complex workflows in distributed environments [2{5], users to quickly understand the I/O characteristics of from optimal workflow management still remains a challenge entire application to specific processes or files by examining because of the complexity of applications. Therefore, one low-level I/O profiles. Second, ParaTrac automatically of important and practical demands is to understand and exploits fine-grained data-processes interactions in workflow characterize the data-intensive applications to help workflow to help users intuitively and quantitatively investigate management systems (WMS) refine their orchestration for realistic execution of data-intensive workflows.