Data Compression “Standard” Approaches

Data Compression “Standard” Approaches GWDAW 2000 Benoit MOURS, Caltech & LAPP-Annecy LIGO-G000346-00-E Data Compression Why should you care? ● Interferometers produce several Mbytes/s (~100Tb/y) » Data Handling is complex. » Archiving cost is important. » Disk space is a limit in your analysis. ● Data compression could improve » Data access from the archive, network, nfs disks... » Data lifetime » Speed of your analysis program ● But data compression may change the data... LIGO-G000346-00-E GWDAW-2000 B. Mours Data Compression 2 Data Compression What could we do? ● Record the right data » Right sampling frequency? » Right electronic gain? (do not record too much electronic noise) » Right format (integer or float)? ● Compress the data » Without loss of information (like gzip). » With some loss of information. » By converting large vector to a few statistical information. Produce different kind of data sets (metadata) LIGO-G000346-00-E GWDAW-2000 B. Mours Data Compression 3 Available Frame Compression (Format Spec. & I/O library) ● Only lossless compression method ● Compression done at the vector level » No need to uncompress unused channels. ● Standard gzip » Integer are differentiate to improve compression rate. ● Zero suppress » Differentiate date are store with the minimal number of bit needed. » Available only for integer. LIGO-G000346-00-E GWDAW-2000 B. Mours Data Compression 4 Data Compression Performances ● Numbers from the November E2 Ligo run Full Frames Reduced' Frames Raw Size 3.2 Mb/s 1.5 MB/s Size after gzip + Zero supp. 1.7Mb/s 1.0 Mb/s Fraction of float 31% 64% Compression ratio for short 2.16 1.98 Compression ratio for float 1.33 1.33 gzip speed (float) 2Mb/s 2Mb/s Zero supress speed (short) 8Mb/s 8Mb/s Speed measured on a Sun Ultra 10. ● Remarks: » Poor speed and compression ratio for float » People want floating points » More floating points than originally foreseen LIGO-G000346-00-E GWDAW-2000 B. Mours Data Compression 5 New Compression for float? ● Principle: convert float to integer ● Method: » Differentiate the data and digitize the differences: (si+1- si)/k » Round off is done by checking that the rebuild data do not diverge from the original date. ● Data saved » First value, the differences convert to an integer, a scaling factor. ● One parameter: number of bits to store the integer ● Compression rate: 32/number of bits » Example: If result stored on 16 bits: compression of 2 ● Speed: Fast: 20 Mbytes/s (if result stored on 16 bits) LIGO-G000346-00-E GWDAW-2000 B. Mours Data Compression 6 Compression for float: Noise H2:LSC-AS_Q Original spectrum noise for 8 bits 6 10 noise for 12 bits noise for 16 bits noise for 16 bits, no integration check 5 10 <amplitude> 4 10 3 10 2 10 10 1 -1 10 2 3 10 10 10frequency[Hz] LIGO-G000346-00-E GWDAW-2000 B. Mours Data Compression 7 Noise for other channels H2:SUS-ETMX_SENSOR_LR H2:LSC-MICH_CTRL H2:ASC-ITMY_OPLEV_PITCH 4 Original spectrum 4 10 10 float to 8 bits 5 3 10 10 float to 12 bits 3 4 2 <amplitude> <amplitude> 10<amplitude> 10 10 float to 16 bits 3 10 10 2 10 2 10 1 -1 10 10 10 -2 1 10 1 -1 -3 10 10 -4 -2 10 10 200 400 600 800 1000 1000 2000 3000 4000 5000 6000 7000 8000 200 400 600 800 1000 frequency[Hz] frequency[Hz] frequency[Hz] H0:PEM-EX_V2 H0:PEM-BSC5_MIC H2:IOO-MC_REFLPD 6 6 10 3 10 10 5 5 10 10 2 <amplitude> <amplitude> <amplitude> 10 4 4 10 10 10 3 10 3 10 1 2 10 2 10 -1 10 10 10 -2 1 10 -1 1 10 200 400 600 800 1000 200 400 600 800 1000 200 400 600 800 1000 frequency[Hz] frequency[Hz] frequency[Hz] LIGO-G000346-00-E GWDAW-2000 B. Mours Data Compression 8 Noise for other channels H2:SUS-ETMX_SENSOR_LR H2:LSC-MICH_CTRL H2:ASC-ITMY_OPLEV_PITCH Original spectrum 6 float to 8 bits 10 4 4 10 10 float to 12 bits 5 10 3 float to 16 bits 10 4 <amplitude> 3 <amplitude> <amplitude> 2 10 10 10 3 10 10 2 10 2 10 1 -1 10 10 10 -2 1 10 -1 -3 1 10 10 -2 -4 10 10 2 3 2 3 2 3 10 10 frequency[Hz]10 10 10 frequency[Hz]10 10 10 frequency[Hz]10 H0:PEM-EX_V2 H0:PEM-BSC5_MIC H2:IOO-MC_REFLPD 6 6 10 3 10 10 5 5 10 10 2 <amplitude> <amplitude> <amplitude> 10 4 10 4 10 10 3 10 3 10 1 2 10 2 10 -1 10 10 10 -2 1 10 -1 1 10 2 3 2 3 2 3 10 10 frequency[Hz]10 10 10 frequency[Hz]10 10 10 frequency[Hz]10 LIGO-G000346-00-E GWDAW-2000 B. Mours Data Compression 9 Summary ● The best ‘compression’ is to record only what you need: » right channel, right frequency, right type (integer are better than float) ● Existing tools » Work OK short integer. » Poor on float. ● Simple lossy compression for float seems possible » Small white noise introduced. » Fast. » Data could be stored in 8 to 16 bits. » But not as good as if the data were stored as integer (8 bits enough). ● More aggressive compression? see S. Klimenko talk LIGO-G000346-00-E GWDAW-2000 B. Mours Data Compression 10.

Load more