Data Compression “Standard” Approaches

Data Compression “Standard” Approaches GWDAW 2000 Benoit MOURS, Caltech & LAPP-Annecy LIGO-G000346-00-E Data Compression Why should you care? ● Interferometers produce several Mbytes/s (~100Tb/y) » Data Handling is complex. » Archiving cost is important. » Disk space is a limit in your analysis. ● Data compression could improve » Data access from the archive, network, nfs disks... » Data lifetime » Speed of your analysis program ● But data compression may change the data... LIGO-G000346-00-E GWDAW-2000 B. Mours Data Compression 2 Data Compression What could we do? ● Record the right data » Right sampling frequency? » Right electronic gain? (do not record too much electronic noise) » Right format (integer or float)? ● Compress the data » Without loss of information (like gzip). » With some loss of information. » By converting large vector to a few statistical information. Produce different kind of data sets (metadata) LIGO-G000346-00-E GWDAW-2000 B. Mours Data Compression 3 Available Frame Compression (Format Spec. & I/O library) ● Only lossless compression method ● Compression done at the vector level » No need to uncompress unused channels. ● Standard gzip » Integer are differentiate to improve compression rate. ● Zero suppress » Differentiate date are store with the minimal number of bit needed. » Available only for integer. LIGO-G000346-00-E GWDAW-2000 B. Mours Data Compression 4 Data Compression Performances ● Numbers from the November E2 Ligo run Full Frames Reduced' Frames Raw Size 3.2 Mb/s 1.5 MB/s Size after gzip + Zero supp. 1.7Mb/s 1.0 Mb/s Fraction of float 31% 64% Compression ratio for short 2.16 1.98 Compression ratio for float 1.33 1.33 gzip speed (float) 2Mb/s 2Mb/s Zero supress speed (short) 8Mb/s 8Mb/s Speed measured on a Sun Ultra 10. ● Remarks: » Poor speed and compression ratio for float » People want floating points » More floating points than originally foreseen LIGO-G000346-00-E GWDAW-2000 B. Mours Data Compression 5 New Compression for float? ● Principle: convert float to integer ● Method: » Differentiate the data and digitize the differences: (si+1- si)/k » Round off is done by checking that the rebuild data do not diverge from the original date. ● Data saved » First value, the differences convert to an integer, a scaling factor. ● One parameter: number of bits to store the integer ● Compression rate: 32/number of bits » Example: If result stored on 16 bits: compression of 2 ● Speed: Fast: 20 Mbytes/s (if result stored on 16 bits) LIGO-G000346-00-E GWDAW-2000 B. Mours Data Compression 6 Compression for float: Noise H2:LSC-AS_Q Original spectrum noise for 8 bits 6 10 noise for 12 bits noise for 16 bits noise for 16 bits, no integration check 5 10 <amplitude> 4 10 3 10 2 10 10 1 -1 10 2 3 10 10 10frequency[Hz] LIGO-G000346-00-E GWDAW-2000 B. Mours Data Compression 7 Noise for other channels H2:SUS-ETMX_SENSOR_LR H2:LSC-MICH_CTRL H2:ASC-ITMY_OPLEV_PITCH 4 Original spectrum 4 10 10 float to 8 bits 5 3 10 10 float to 12 bits 3 4 2 <amplitude> <amplitude> 10<amplitude> 10 10 float to 16 bits 3 10 10 2 10 2 10 1 -1 10 10 10 -2 1 10 1 -1 -3 10 10 -4 -2 10 10 200 400 600 800 1000 1000 2000 3000 4000 5000 6000 7000 8000 200 400 600 800 1000 frequency[Hz] frequency[Hz] frequency[Hz] H0:PEM-EX_V2 H0:PEM-BSC5_MIC H2:IOO-MC_REFLPD 6 6 10 3 10 10 5 5 10 10 2 <amplitude> <amplitude> <amplitude> 10 4 4 10 10 10 3 10 3 10 1 2 10 2 10 -1 10 10 10 -2 1 10 -1 1 10 200 400 600 800 1000 200 400 600 800 1000 200 400 600 800 1000 frequency[Hz] frequency[Hz] frequency[Hz] LIGO-G000346-00-E GWDAW-2000 B. Mours Data Compression 8 Noise for other channels H2:SUS-ETMX_SENSOR_LR H2:LSC-MICH_CTRL H2:ASC-ITMY_OPLEV_PITCH Original spectrum 6 float to 8 bits 10 4 4 10 10 float to 12 bits 5 10 3 float to 16 bits 10 4 <amplitude> 3 <amplitude> <amplitude> 2 10 10 10 3 10 10 2 10 2 10 1 -1 10 10 10 -2 1 10 -1 -3 1 10 10 -2 -4 10 10 2 3 2 3 2 3 10 10 frequency[Hz]10 10 10 frequency[Hz]10 10 10 frequency[Hz]10 H0:PEM-EX_V2 H0:PEM-BSC5_MIC H2:IOO-MC_REFLPD 6 6 10 3 10 10 5 5 10 10 2 <amplitude> <amplitude> <amplitude> 10 4 10 4 10 10 3 10 3 10 1 2 10 2 10 -1 10 10 10 -2 1 10 -1 1 10 2 3 2 3 2 3 10 10 frequency[Hz]10 10 10 frequency[Hz]10 10 10 frequency[Hz]10 LIGO-G000346-00-E GWDAW-2000 B. Mours Data Compression 9 Summary ● The best ‘compression’ is to record only what you need: » right channel, right frequency, right type (integer are better than float) ● Existing tools » Work OK short integer. » Poor on float. ● Simple lossy compression for float seems possible » Small white noise introduced. » Fast. » Data could be stored in 8 to 16 bits. » But not as good as if the data were stored as integer (8 bits enough). ● More aggressive compression? see S. Klimenko talk LIGO-G000346-00-E GWDAW-2000 B. Mours Data Compression 10.

Data Compression “Standard” Approaches

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support