Data Reduction

Total Page:16

File Type:pdf, Size:1020Kb

Data Reduction Data Reduction Parallel Storage Systems 2021-06-08 Jun.-Prof. Dr. Michael Kuhn [email protected] Parallel Computing and I/O Institute for Intelligent Cooperating Systems Faculty of Computer Science Otto von Guericke University Magdeburg https://parcio.ovgu.de Outline Review Data Reduction Review Motivation Recomputation Deduplication Compression Advanced Compression Summary Michael Kuhn Data Reduction 1 / 50 Optimizations Review • Why is it important to repeat measurements? 1. Warm up the file system cache 2. Randomize experiments for statistical purposes 3. Eliminate systematic errors 4. Eliminate random errors Michael Kuhn Data Reduction 2 / 50 Optimizations Review • What does the queue depth for asynchronous operations refer to? 1. Size of the operations 2. Number of operations in flight 3. Maximum size of an operation Michael Kuhn Data Reduction 2 / 50 Optimizations Review • What is a context switch? 1. The application switches between two open files 2. The application switches between two I/O operations 3. The operating system switches between two processes 4. The operating system switches between two file systems Michael Kuhn Data Reduction 2 / 50 Optimizations Review • Which is the fastest I/O setup in terms of throughput? 1. One client communicating with one server 2. One client communicating with ten servers 3. Ten clients communicating with one server 4. Ten clients communicating with ten servers Michael Kuhn Data Reduction 2 / 50 Outline Motivation Data Reduction Review Motivation Recomputation Deduplication Compression Advanced Compression Summary Michael Kuhn Data Reduction 3 / 50 Performance Development Motivation • Hardware improves exponentially, but at dierent rates • Storage capacity and throughput are lagging behind computation • Network and memory throughput are even further behind • Transferring and storing data has become a very costly • Data can be produced even more rapidly • Oen impossible to keep all of the data indefinitely • Consequence: Higher investment costs for storage hardware • Leads to less money being available for computation • Alternatively, systems have to become more expensive overall • Storage hardware can be a significant part of total cost of ownership (TCO) • Approximately 20 % of total costs at DKRZ, ≈ e 6,000,000 procurement costs Michael Kuhn Data Reduction 4 / 50 Performance Development... Motivation Computation Storage capacity Storage throughput Network throughput Memory throughput 100,000,000 10,000,000 1,000,000 100,000 10,000 1,000 100 10 Performance improvement Performance 1 0 5 10 15 20 25 30 Years • Computation: 300× every ten years (based on TOP500) • Storage capacity: 100× every 10 years • Storage throughput: 20× every 10 years Michael Kuhn Data Reduction 5 / 50 Example: DKRZ [Kunkel et al., 2014] Motivation 2009 2015 Faktor Performance 150 TF/s 3 PF/s 20x Node Count 264 2,500 9.5x Node Performance 0.6 TF/s 1.2 TF/s 2x Main Memory 20 TB 170 TB 8.5x Storage Capacity 5.6 PB 45 PB 8x Storage Throughput 30 GB/s 400 GB/s 13.3x HDD Count 7,200 8,500 1.2x Archive Capacity 53 PB 335 PB 6.3x Archive Throughput 9.6 GB/s 21 GB/s 2.2x Energy Consumption 1.6 MW 1.4 MW 0.9x Procurement Costs 30 Me 30 Me 1x Michael Kuhn Data Reduction 6 / 50 Example: DKRZ... [Kunkel et al., 2014] Motivation 2020 2025 Exascale (2020) Performance 60 PF/s 1.2 EF/s 1 EF/s Node Count 12,500 31,250 100k–1M Node Performance 4.8 TF/s 38.4 TF/s 1–15 TF/s Main Memory 1.5 PB 12.8 PB 3.6–300 PB Storage Capacity 270 PB 1.6 EB 0.15–18 EB Storage Throughput 2.5 TB/s 15 TB/s 20–300 TB/s HDD Count 10,000 12,000 100k–1M Archive Capacity 1.3 EB 5.4 EB 7.2–600 EB Archive Throughput 57 GB/s 128 GB/s — Energy Consumption 1.4 MW 1.4 MW 20–70 MW Procurement Costs 30 Me 30 Me 200 M$ Michael Kuhn Data Reduction 7 / 50 Approaches Motivation • Storage costs should be kept stable if possible • It is necessary to reduce the amount of data produced and stored • First step: Determine the actual storage costs • It is important to dierentiate dierent types of costs • Cost model for computation, storage and archival [Kunkel et al., 2014] • Analysis of dierent data reduction techniques • Recomputation, deduplication and compression Michael Kuhn Data Reduction 8 / 50 CMIP5 [Kunkel et al., 2014] Motivation • Part of the AR5 of the IPCC (Coupled Model Intercomparison Project Phase 5) • Climate model comparison for a common set of experiments • More than 10.8 million processor hours at DKRZ • 482 runs, simulating a total of 15,280 years • Data volume of more than 640 TB, refined into 55 TB via post-processing • Prototypical low-resolution configuration: • A year takes about 1.5 hours on the 2009 system • A month of simulation accounts for 4 GB of data • Finishes by creating a checkpoint (4 GB), another job restarts from it • Every 10th checkpoint is kept and archived • Data was stored on the file system for almost three years • Archived for 10 years Michael Kuhn Data Reduction 9 / 50 CMIP5... [Kunkel et al., 2014] Motivation CMIP5 System 2009 2015 2020 2025 Computation 10.50 0.55 0.03 0.001 Procurement 45.02 5.60 0.93 0.16 Storage Access 0.09 0.01 0 0 Metadata 0.04 0 0 0 Checkpoint 0 0 0 0 Archival 10.35 1.66 0.41 0.10 Total 66.01 7.82 1.38 0.26 • 2009: Computation costs are roughly the same as archival costs • Storage costs are more than four times higher • Storage and archival are getting (relatively) more expensive Michael Kuhn Data Reduction 10 / 50 HD(CP)2 [Kunkel et al., 2014] Motivation • High Definition Clouds and Precipitation for Climate Prediction • Improve the understanding of cloud and precipitation processes • Analyze their implication for climate prediction • A simulation of Germany with a grid resolution of 416 m • The run on the DKRZ system from 2009 needs 5,260 GB of memory • Simulates 2 hours in a wallclock time of 86 minutes • Model results are written every 30 model minutes • A checkpoint is created when the program terminates • Output has to be kept on the global file system for only one week Michael Kuhn Data Reduction 11 / 50 HD(CP)2... [Kunkel et al., 2014] Motivation HD(CP)2 System 2009 2015 2020 2025 Computation 165.07 8.72 0.44 0.02 Procurement 2.37 0.30 0.05 0.01 Storage Access 0.94 0.07 0.01 0 Metadata 0 0 0 0 Checkpoint 0.33 0.02 0 0 Archival 86.91 13.91 3.48 0.87 Total 255.29 22.99 3.97 0.90 • Higher computation costs than CMIP5 • 2009: Computation costs are roughly two times the archival costs • Lower storage costs because data is archived faster Michael Kuhn Data Reduction 12 / 50 Data Reduction Techniques Motivation • There are dierent concepts to reduce the amount data to store • We will take a closer look at three in particular 1. Recomputing results instead of storing them • Not all results are stored explicitly but recomputed on demand 2. Deduplication to reduce redundancies • Identical blocks of data are only stored once 3. Compression • Data can be compressed within the application, the middleware or the file system Michael Kuhn Data Reduction 13 / 50 Outline Recomputation Data Reduction Review Motivation Recomputation Deduplication Compression Advanced Compression Summary Michael Kuhn Data Reduction 14 / 50 Overview [Kunkel et al., 2014] Recomputation • Do not store all produced data • Data will be analyzed in-situ, that is, at runtime • Requires a careful definition of the analyses • Post-mortem data analysis is impossible • A new analysis requires repeated computation • Re-computation can be attractive • If the costs for keeping data are substantially higher than re-computation costs • Cost of computation is higher than the cost for archiving the data in 2009 • Computational power continues to improve faster than storage technology Michael Kuhn Data Reduction 15 / 50 Analysis [Kunkel et al., 2014] Recomputation • 2015 • Recomputation if data is accessed only once (HD(CP)2) or less than 13 times (CMIP5) • 2020 • HD(CP)2: Recomputation worth it if data is accessed less than eight times • CMIP5: Archival is cheaper if data is accessed more than 44 times • 2025 • Recompute if data is accessed less than 44 (HD(CP)2) or 260 (CMIP5) times Michael Kuhn Data Reduction 16 / 50 Quiz Recomputation • How would you archive your application to be executed in five years? 1. Keep the binaries and rerun it 2. Keep the source code and recompile it 3. Put it into a container/virtual machine Michael Kuhn Data Reduction 17 / 50 Problems: Preserve Binaries [Kunkel et al., 2014] Recomputation • Keep binaries of applications and all their dependencies • Containers and virtual machines have made this much easier • Eectively impossible to execute the application on diering future architectures • x86-64 vs. POWER, big endian vs. little endian • Emulation usually has significant performance impacts • Re-computation on the same supercomputer appears feasible • Keep all dependencies (versioned modules) and link statically Michael Kuhn Data Reduction 18 / 50 Problems: Source Code [Kunkel et al., 2014] Recomputation • All components can be compiled even on dierent hardware architectures • Most likely will require additional eort from developers • Dierent operating systems, compilers etc. could be incompatible • Might still require preserving all dependencies • Changes to minute details could lead to diering results • Dierent processors, network technologies etc. could change results • Can be ignored in some cases as long as results are “statistically equal” Michael Kuhn Data Reduction 19 / 50 Summary Recomputation • Recomputation can be worth it given current performance developments • Computation is developing much faster than storage
Recommended publications
  • Nxadmin CLI Reference Guide Unity Iv Contents
    HYPER-UNIFIED STORAGE nxadmin Command Line Interface Reference Guide NEXSAN | 325 E. Hillcrest Drive, Suite #150 | Thousand Oaks, CA 91360 USA Printed Thursday, July 26, 2018 | www.nexsan.com Copyright © 2010—2018 Nexsan Technologies, Inc. All rights reserved. Trademarks Nexsan® is a trademark or registered trademark of Nexsan Technologies, Inc. The Nexsan logo is a registered trademark of Nexsan Technologies, Inc. All other trademarks and registered trademarks are the property of their respective owners. Patents This product is protected by one or more of the following patents, and other pending patent applications worldwide: United States patents US8,191,841, US8,120,922; United Kingdom patents GB2466535B, GB2467622B, GB2467404B, GB2296798B, GB2297636B About this document Unauthorized use, duplication, or modification of this document in whole or in part without the written consent of Nexsan Corporation is strictly prohibited. Nexsan Technologies, Inc. reserves the right to make changes to this manual, as well as the equipment and software described in this manual, at any time without notice. This manual may contain links to web sites that were current at the time of publication, but have since been moved or become inactive. It may also contain links to sites owned and operated by third parties. Nexsan is not responsible for the content of any such third-party site. Contents Contents Contents iii Chapter 1: Accessing the nxadmin and nxcmd CLIs 15 Connecting to the Unity Storage System using SSH 15 Prerequisite 15 Connecting to the Unity
    [Show full text]
  • Smaller Flight Data Recorders{
    Available online at http://docs.lib.purdue.edu/jate Journal of Aviation Technology and Engineering 2:2 (2013) 45–55 Smaller Flight Data Recorders{ Yair Wiseman and Alon Barkai Bar-Ilan University Abstract Data captured by flight data recorders are generally stored on the system’s embedded hard disk. A common problem is the lack of storage space on the disk. This lack of space for data storage leads to either a constant effort to reduce the space used by data, or to increased costs due to acquisition of additional space, which is not always possible. File compression can solve the problem, but carries with it the potential drawback of the increased overhead required when writing the data to the disk, putting an excessive load on the system and degrading system performance. The author suggests the use of an efficient compressed file system that both compresses data in real time and ensures that there will be minimal impact on the performance of other tasks. Keywords: flight data recorder, data compression, file system Introduction A flight data recorder is a small line-replaceable computer unit employed in aircraft. Its function is recording pilots’ inputs, electronic inputs, sensor positions and instructions sent to any electronic systems on the aircraft. It is unofficially referred to as a "black box". Flight data recorders are designed to be small and thoroughly fabricated to withstand the influence of high speed impacts and extreme temperatures. A flight data recorder from a commercial aircraft can be seen in Figure 1. State-of-the-art high density flash memory devices have permitted the solid state flight data recorder (SSFDR) to be implemented with much larger memory capacity.
    [Show full text]
  • The Pillars of Lossless Compression Algorithms a Road Map and Genealogy Tree
    International Journal of Applied Engineering Research ISSN 0973-4562 Volume 13, Number 6 (2018) pp. 3296-3414 © Research India Publications. http://www.ripublication.com The Pillars of Lossless Compression Algorithms a Road Map and Genealogy Tree Evon Abu-Taieh, PhD Information System Technology Faculty, The University of Jordan, Aqaba, Jordan. Abstract tree is presented in the last section of the paper after presenting the 12 main compression algorithms each with a practical This paper presents the pillars of lossless compression example. algorithms, methods and techniques. The paper counted more than 40 compression algorithms. Although each algorithm is The paper first introduces Shannon–Fano code showing its an independent in its own right, still; these algorithms relation to Shannon (1948), Huffman coding (1952), FANO interrelate genealogically and chronologically. The paper then (1949), Run Length Encoding (1967), Peter's Version (1963), presents the genealogy tree suggested by researcher. The tree Enumerative Coding (1973), LIFO (1976), FiFO Pasco (1976), shows the interrelationships between the 40 algorithms. Also, Stream (1979), P-Based FIFO (1981). Two examples are to be the tree showed the chronological order the algorithms came to presented one for Shannon-Fano Code and the other is for life. The time relation shows the cooperation among the Arithmetic Coding. Next, Huffman code is to be presented scientific society and how the amended each other's work. The with simulation example and algorithm. The third is Lempel- paper presents the 12 pillars researched in this paper, and a Ziv-Welch (LZW) Algorithm which hatched more than 24 comparison table is to be developed.
    [Show full text]
  • The Deep Learning Solutions on Lossless Compression Methods for Alleviating Data Load on Iot Nodes in Smart Cities
    sensors Article The Deep Learning Solutions on Lossless Compression Methods for Alleviating Data Load on IoT Nodes in Smart Cities Ammar Nasif *, Zulaiha Ali Othman and Nor Samsiah Sani Center for Artificial Intelligence Technology (CAIT), Faculty of Information Science & Technology, University Kebangsaan Malaysia, Bangi 43600, Malaysia; [email protected] (Z.A.O.); [email protected] (N.S.S.) * Correspondence: [email protected] Abstract: Networking is crucial for smart city projects nowadays, as it offers an environment where people and things are connected. This paper presents a chronology of factors on the development of smart cities, including IoT technologies as network infrastructure. Increasing IoT nodes leads to increasing data flow, which is a potential source of failure for IoT networks. The biggest challenge of IoT networks is that the IoT may have insufficient memory to handle all transaction data within the IoT network. We aim in this paper to propose a potential compression method for reducing IoT network data traffic. Therefore, we investigate various lossless compression algorithms, such as entropy or dictionary-based algorithms, and general compression methods to determine which algorithm or method adheres to the IoT specifications. Furthermore, this study conducts compression experiments using entropy (Huffman, Adaptive Huffman) and Dictionary (LZ77, LZ78) as well as five different types of datasets of the IoT data traffic. Though the above algorithms can alleviate the IoT data traffic, adaptive Huffman gave the best compression algorithm. Therefore, in this paper, Citation: Nasif, A.; Othman, Z.A.; we aim to propose a conceptual compression method for IoT data traffic by improving an adaptive Sani, N.S.
    [Show full text]
  • The Illusion of Space and the Science of Data Compression And
    The Illusion of Space and the Science of Data Compression and Deduplication Bruce Yellin EMC Proven Professional Knowledge Sharing 2010 Bruce Yellin Advisory Technology Consultant EMC Corporation [email protected] Table of Contents What Was Old Is New Again ......................................................................................................... 4 The Business Benefits of Saving Space ....................................................................................... 6 Data Compression Strategies ....................................................................................................... 9 Data Compression Basics ....................................................................................................... 10 Compression Bakeoff .............................................................................................................. 13 Data Deduplication Strategies .................................................................................................... 16 Deduplication - Theory of Operation ....................................................................................... 16 File Level Deduplication - Single Instance Storage ................................................................. 21 Fixed-Block Deduplication ....................................................................................................... 23 Variable-Block Deduplication .................................................................................................. 24 Content-Aware Deduplication
    [Show full text]
  • Data Compression in Solid State Storage
    Data Compression in Solid State Storage John Fryar [email protected] Santa Clara, CA August 2013 1 Acknowledgements This presentation would not have been possible without the counsel, hard work and graciousness of the following individuals and/or organizations: Raymond Savarda Sandgate Technologies Santa Clara, CA August 2013 2 Disclaimers The opinions expressed herein are those of the author and do not necessarily represent those of any other organization or individual unless specifically cited. A thorough attempt to acknowledge all sources has been made. That said, we’re all human… Santa Clara, CA August 2013 3 Learning Objectives At the conclusion of this tutorial the audience will have been exposed to: • The different types of Data Compression • Common Data Compression Algorithms • The Deflate/Inflate (GZIP/GUNZIP) algorithms in detail • Implementation Options (Software/Hardware) • Impacts of design parameters in Performance • SSD benefits and challenges • Resources for Further Study Santa Clara, CA August 2013 4 Agenda • Background, Definitions, & Context • Data Compression Overview • Data Compression Algorithm Survey • Deflate/Inflate (GZIP/GUNZIP) in depth • Software Implementations • HW Implementations • Tradeoffs & Advanced Topics • SSD Benefits and Challenges • Conclusions Santa Clara, CA August 2013 5 Definitions Item Description Comments Open A system which will compress Must strictly adhere to standards System data for use by other entities. on compress / decompress I.E. the compressed data will algorithms exit the system Interoperability among vendors mandated for Open Systems Closed A system which utilizes Can support a limited, optimized System compressed data internally but subset of standard. does not expose compressed Also allows custom algorithms data to the outside world No Interoperability req’d.
    [Show full text]
  • Use Style: Paper Title
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Diponegoro University Institutional Repository On The Implementation of ZFS (Zettabyte File System) Storage System Eko D. Widianto, Agung B. Prasetijo, and Ahmad Ghufroni Department of Computer Engineering, Faculty of Engineering – Diponegoro University Jl. Prof. Soedarto, SH, Tembalang, Semarang, Indonesia E-mail: [email protected]; [email protected]; [email protected] Abstract – Digital data storage is very critical in computer 32 KB, ZFS is 200MB/s faster than EXT3. The throughput of systems. Storage devices used to store data may at any time suffer write process in ZFS is as good as EXT3 for the size of I/O from damage caused by its lifetime, resource failure or factory less than 32 KB, whilst, for the size of I/O larger than 32 KB defects. Such damage may lead to loss of important data. The risk ZFS is 150 MB/s faster compared to EXT3. of data loss in the event of device damage can be minimized by Literature [2] explored the performance, capacity and building a storage system that supports redundancy. The design of storage based on ZFS (Zettabyte File System) aims at building a integrity of ZFS raidz. The tests for its read and write speed is storage system that supports redundancy and data integrity without as follows: in mirror mode using two 4TB hard drive, ZFS has requiring additional RAID controllers. When the system fails on 488 MB/s for read speed and 106MB/s for write speed.
    [Show full text]
  • Mimicking Datasets for Content Generation in Storage Benchmarks
    SDGen: Mimicking Datasets for Content Generation in Storage Benchmarks Raúl Gracia-Tinedo, Universitat Rovira i Virgili; Danny Harnik, Dalit Naor, and Dmitry Sotnikov, IBM Research Haifa; Sivan Toledo and Aviad Zuck, Tel-Aviv University https://www.usenix.org/conference/fast15/technical-sessions/presentation/gracia-tinedo This paper is included in the Proceedings of the 13th USENIX Conference on File and Storage Technologies (FAST ’15). February 16–19, 2015 • Santa Clara, CA, USA ISBN 978-1-931971-201 Open access to the Proceedings of the 13th USENIX Conference on File and Storage Technologies is sponsored by USENIX SDGen: Mimicking Datasets for Content Generation in Storage Benchmarks Raul´ Gracia-Tinedo Danny Harnik, Dalit Naor, Dmitry Sotnikov Universitat Rovira i Virgili (Spain) IBM Research-Haifa (Israel) [email protected] dannyh, dalit, dmitrys @il.ibm.com { } Sivan Toledo, Aviad Zuck Tel-Aviv University (Israel) stoledo, aviadzuc @tau.ac.il { } Abstract However, most storage benchmarks do not pay partic- Storage system benchmarks either use samples of pro- ular attention to the contents generated during their exe- prietary data or synthesize artificial data in simple ways cution [35] (see examples in Table 1). For instance, Im- (such as using zeros or random data). However, many pressions [7] implements accurate statistical methods to storage systems behave completely differently on such model the structure of a file system, but the contents of artificial data than they do on real-world data. This is the files are by default zeros or statically generated by third- case with systems that include data reduction techniques, party applications. Another example is OLTPBench [15], such as compression and/or deduplication.
    [Show full text]
  • QZFS: QAT Accelerated Compression in File System for Application
    QZFS: QAT Accelerated Compression in File System for Application Agnostic and Cost Efficient Data Storage Xiaokang Hu and Fuzong Wang, Shanghai Jiao Tong University, Intel Asia-Pacific R&D Ltd.; Weigang Li, Intel Asia-Pacific R&D Ltd.; Jian Li and Haibing Guan, Shanghai Jiao Tong University https://www.usenix.org/conference/atc19/presentation/wang-fuzong This paper is included in the Proceedings of the 2019 USENIX Annual Technical Conference. July 10–12, 2019 • Renton, WA, USA ISBN 978-1-939133-03-8 Open access to the Proceedings of the 2019 USENIX Annual Technical Conference is sponsored by USENIX. QZFS: QAT Accelerated Compression in File System for Application Agnostic and Cost Efficient Data Storage Xiaokang Hu Fuzong Wang∗ Weigang Li Shanghai Jiao Tong University Shanghai Jiao Tong University Intel Asia-Pacific R&D Ltd. Intel Asia-Pacific R&D Ltd. Intel Asia-Pacific R&D Ltd. Jian Li Haibing Guan Shanghai Jiao Tong University Shanghai Jiao Tong University Abstract Hadoop [3], Spark [4] or stream processing job [40] is com- pressed, the data processing performance can be effectively Data compression can not only provide space efficiency with enhanced as the compression not only saves bandwidth but lower Total Cost of Ownership (TCO) but also enhance I/O also decreases the number of read/write operations from/to performance because of the reduced read/write operations. storage systems. However, lossless compression algorithms with high com- It is widely recognized that the benefits of data compression pression ratio (e.g. gzip) inevitably incur high CPU resource come at the expense of computational cost [1,9], especially consumption.
    [Show full text]
  • Data Compression for Climate Data
    Data compression for climate data Article Published Version Creative Commons: Attribution-Noncommercial 3.0 Open access Kuhn, M., Kunkel, J. M. and Ludwig, T. (2016) Data compression for climate data. Supercomputing Frontiers and Innovations, 3 (1). pp. 75-94. ISSN 2313-8734 doi: https://doi.org/10.14529/jsfi160105 Available at http://centaur.reading.ac.uk/77684/ It is advisable to refer to the publisher’s version if you intend to cite from the work. See Guidance on citing . Identification Number/DOI: https://doi.org/10.14529/jsfi160105 <https://doi.org/10.14529/jsfi160105> Publisher: South Urals University All outputs in CentAUR are protected by Intellectual Property Rights law, including copyright law. Copyright and IPR is retained by the creators or other copyright holders. Terms and conditions for use of this material are defined in the End User Agreement . www.reading.ac.uk/centaur CentAUR Central Archive at the University of Reading Reading’s research outputs online DOI: 10.14529/jsfi160105 Data Compression for Climate Data Michael Kuhn1, Julian Kunkel2, Thomas Ludwig2 © The Authors 2016. This paper is published with open access at SuperFri.org The different rates of increase for computational power and storage capabilities of super- computers turn data storage into a technical and economical problem. As storage capabilities are lagging behind, investments and operational costs for storage systems have increased to keep up with the supercomputers’ I/O requirements. One promising approach is to reduce the amount of data that is stored. In this paper, we take a look at the impact of compression on performance and costs of high performance systems.
    [Show full text]
  • High Speed Lossless Image Compression
    High Speed Lossless Image Compression Hendrik Siedelmann12, Alexander Wender1, and Martin Fuchs1 1 University of Stuttgart, Germany 2 Heidelberg University, Germany Abstract. We introduce a simple approach to lossless image compression, which makes use of SIMD vectorization at every processing step to provide very high speed on modern CPUs. This is achieved by basing the compression on delta cod- ing for prediction and bit packing for the actual compression, allowing a tuneable tradeoff between efficiency and speed, via the block size used for bit packing. The maximum achievable speed surpasses main memory bandwidth on the tested CPU, as well as the speed of all previous methods that achieve at least the same coding efficiency. 1 Introduction For applications which need to process large amounts of image data such as high speed video or high resolution light fields, the I/O subsystem can easily become the bottle- neck, even when using a RAID0 or SSDs for storage. This makes compression an at- tractive tool to widen this bottleneck, increasing overall performance. As lossy com- pression incurs signal degradation and may interfere with further processing, we will only consider lossless compression in the following. While dictionary based compres- sion methods like lzo [18] can reach a high bandwidth, the compression ratio of such generic compression methods is quite low compared to dedicated image compression methods. However, research in lossless image compression has mainly been concerned with the maximization of the compression ratio, and even relatively fast image com- pression schemes like jpeg-ls are not able to keep up with the transfer rates of fast I/O configurations.
    [Show full text]
  • Smaller Flight Data Recorders{
    Available online at http://docs.lib.purdue.edu/jate Journal of Aviation Technology and Engineering 2:2 (2013) 45–55 Smaller Flight Data Recorders{ Yair Wiseman and Alon Barkai Bar-Ilan University Abstract Data captured by flight data recorders are generally stored on the system’s embedded hard disk. A common problem is the lack of storage space on the disk. This lack of space for data storage leads to either a constant effort to reduce the space used by data, or to increased costs due to acquisition of additional space, which is not always possible. File compression can solve the problem, but carries with it the potential drawback of the increased overhead required when writing the data to the disk, putting an excessive load on the system and degrading system performance. The author suggests the use of an efficient compressed file system that both compresses data in real time and ensures that there will be minimal impact on the performance of other tasks. Keywords: flight data recorder, data compression, file system Introduction A flight data recorder is a small line-replaceable computer unit employed in aircraft. Its function is recording pilots’ inputs, electronic inputs, sensor positions and instructions sent to any electronic systems on the aircraft. It is unofficially referred to as a "black box". Flight data recorders are designed to be small and thoroughly fabricated to withstand the influence of high speed impacts and extreme temperatures. A flight data recorder from a commercial aircraft can be seen in Figure 1. State-of-the-art high density flash memory devices have permitted the solid state flight data recorder (SSFDR) to be implemented with much larger memory capacity.
    [Show full text]