A Benchmark for Water Column Data Compression

A benchmark for water column data compression ing. Bouke Grevelt August 27, 2017, 65 pages Supervisor: dr. ir. A.L. Varbanescu Host organisation: Qps B.V., http://www.qps.nl, Zeist, +31 (0)30 69 41 200 Contact: Jonathan Beaudoin, PhD, Fredericton, +1 506 454 4487, [email protected] Universiteit van Amsterdam Faculteit der Natuurwetenschappen, Wiskunde en Informatica Master Software Engineering http://www.software-engineering-amsterdam.nl Contents Abstract 3 Preface 4 1 Introduction 5 1.1 Problem definition ...................................... 6 1.2 Approach ........................................... 6 1.3 Document structure ..................................... 7 2 Related Work 8 2.1 Water column data compression............................... 8 2.2 Benchmarking & test benches................................ 9 2.3 Corpus creation........................................ 11 3 Water Column Corpus 14 3.1 Existing corpora ....................................... 14 3.1.1 Calgary corpus.................................... 14 3.1.2 Canterbury corpus.................................. 14 3.1.3 Ekushe-Khul..................................... 15 3.1.4 Silesia corpus..................................... 15 3.1.5 Prague Corpus.................................... 16 3.2 Corpus creation method................................... 16 3.3 Results............................................. 17 4 Metrics 21 4.1 Generic compression metrics................................. 21 4.2 Real-time compression.................................... 21 4.3 Processing........................................... 22 4.4 Cost reduction ........................................ 24 4.5 Overview ........................................... 26 5 Test bench 27 5.1 Requirements......................................... 27 5.2 Structural design....................................... 27 5.2.1 Test execution logic ................................. 28 5.2.2 Input files....................................... 28 5.3 Compression algorithm.................................... 29 5.4 Compressed file / decompressed file............................. 30 5.5 Results............................................. 30 5.5.1 Configuration file................................... 30 5.6 Behavioral design....................................... 31 5.6.1 Test execution .................................... 31 5.6.2 Metrics computation................................. 31 6 Benchmark 34 1 6.1 Publication .......................................... 34 6.2 Rules.............................................. 35 7 Using the test bench 37 7.1 Installing the test bench................................... 37 7.2 Adding a new algorithm to the test bench......................... 37 7.2.1 The algorithm to add ................................ 37 7.2.2 Building a shared library .............................. 38 7.2.3 Exposing the algorithm in python ......................... 39 7.2.4 implementing the interface ............................. 39 7.2.5 Add the algorithm to the test bench configuration ................ 40 7.3 Configuring the test bench.................................. 40 7.4 Running the test bench ................................... 45 8 Results and Analysis 46 8.1 Algorithms .......................................... 46 8.2 Generic compression metrics................................. 47 8.2.1 Deviation in (de)compression time......................... 47 8.2.2 LZMA compression time............................... 47 8.2.3 Jpg2k compression ratio............................... 48 8.2.4 Random access decompression compared to full file decompression . 48 8.2.5 Jpg2k performance.................................. 48 8.3 The processing metric .................................... 48 8.4 The real-time metric..................................... 50 8.5 The cost reduction metric.................................. 51 9 Conclusion & Future work 53 9.1 Threats to validity...................................... 53 9.1.1 File properties considered in corpus selection ................... 53 9.1.2 Restriction of candidate files for corpus selection................. 54 9.1.3 Hardware differences between acquisition and processing............. 54 9.2 Future work.......................................... 54 9.2.1 Water Column Corpus inclusion criteria...................... 54 9.2.2 Improve performance................................. 54 9.2.3 Data generation ................................... 54 9.2.4 Lossy compression .................................. 54 9.2.5 Hardware differences................................. 55 9.3 Community involvement................................... 55 Bibliography 56 A Water column data 58 B Generic Water column Format 60 B.1 Purpose............................................ 60 B.2 Structure ........................................... 60 B.3 Conversion .......................................... 61 C Test bench Configuration file 63 2 Abstract Multibeam echosounders are devices that use acoustic pulses to measure the topology of the sea floor. Modern multibeam echosounders make the raw data that is used to determine the sea floor topology available to the user. This data is referred to as water column data. The scientific community has identified many applications for water column data to improve and augment current hydrographic applications. Due to its large size compared to other hydrographic data, surveyors often choose not to store water column data. Compression of water column data has been identified as a possible solution to this problem and multiple methods for compression have been proposed. As there currently is no clear definition on how to measure the performance of water column compression algorithms, it is unclear what the current state of the art is. In this work, we show that benchmarking can be used to evaluate the performance of water column compression algorithms. We present the Water Column Corpus: a methodologically selected, repre- sentative set of water column data files in the public domain to be used as the input data for the benchmark and a set of metrics used to measure compression algorithm performance. A test bench was developed and published in the public domain to compute the presented metrics for the files in the Water Column Corpus. Four generic algorithms and one water column data specific compression algorithm are included in the test bench. The results clearly distinguish the different algorithms based on their performance. 3 Preface This thesis describes my research project at Quality Positioning Services (QPS). QPS has been my employer since before I started the Software Engineering program at the university of Amsterdam (UvA). Acknowledgements First and foremost I need to thank Ana Lucia Varbanescu for her inexhaustible guidance and support throughout this whole endeavor. Thanks to Jonathan Beaudoin for always finding the time for a chat or a review and of course for coming up with the topic. Thanks to Duko van der Schaaf for supporting me when I wanted to work less to go back to school. Thanks to Duncan Mallace & Tom Weber for helping me find specific water column data. Thanks to QPS for supporting me throughout the program and with the thesis especially. But most of all thanks to Charlotte for putting up with me for two years of stress, complaints and preoccupation. 4 Chapter 1 Introduction QPS B.V., hereafter referred to as 'the host organization', is a company that builds and sells hydrographic software. One of the main purposes of this software is to gather and display information on the depth and form of the seafloor. This is referred to as bathymetric data. A number of different types of devices are capable of gathering bathymetric data. The type of device most commonly used for this purpose is the multibeam echo sounder (MBES). This device emits sound waves to the seafloor and determines the location of the seafloor based on the received echo. The technique is similar to ultrasounds used in the medical field. Many modern multibeam echosounders make the raw data that was used to compute the bathymetry available to the user. This data is referred to as water column data1. Figure 1.1: Water column data in grey with seafloor detections in green and red The scientific community has identified opportunities to use water column data to for a number of purposes, including • Improved noise recognition. [Cla06] • Improved bottom detection. [Cla06] • Improved ability to estimate minimum clearance over wrecks. [Cla06], [CRBW14] • Study of marine life. [CRBW14] • Study of sub-aquatic gas vents [CRBW14] • Study of suspended sediment [CRBW14] These novel applications provide opportunities for the host organization to improve and expand their software suite. A problem with water column data is its volume. With data rates of several gigabytes per hour [Bea10], storage requirements drastically increase when water column data is to be stored in addition 1More information on water column data can be found in appendixA 5 to the data more commonly stored during hydrographic acquisition. Beaudoin states that "A ten- fold increase in data storage requirements is not uncommon" [Bea10]. Moszynski et al. state that "the size of water column data can easily exceed 95% of all data collected by a multibeam system" [MCKL13]. The additional costs incurred by the high volume of water column data are prohibitive to the collection of this data [MCKL13][Bea10][APR+16]. The host organization believes that the volume reduction of water column data will make the collection of the data less prohibitive and thus

Load more