<<

“Standard” Approaches

GWDAW 2000 Benoit MOURS, Caltech & LAPP-Annecy

LIGO-G000346-00-E Data Compression Why should you care?

● Interferometers produce several Mbytes/s (~100Tb/y) » Data Handling is complex. » Archiving cost is important. » Disk space is a limit in your analysis.

● Data compression could improve » Data access from the archive, network, nfs disks... » Data lifetime » Speed of your analysis program

● But data compression may change the data...

LIGO-G000346-00-E GWDAW-2000 B. Mours Data Compression 2 Data Compression What could we do?

● Record the right data » Right sampling frequency? » Right electronic gain? (do not record too much electronic noise) » Right format (integer or float)?

the data » Without loss of information (like gzip). » With some loss of information. » By converting large vector to a few statistical information. Produce different kind of data sets (metadata)

LIGO-G000346-00-E GWDAW-2000 B. Mours Data Compression 3 Available Frame Compression (Format Spec. & I/O library)

● Only method

● Compression done at the vector level » No need to uncompress unused channels.

● Standard gzip » Integer are differentiate to improve compression rate.

● Zero suppress » Differentiate date are store with the minimal number of bit needed. » Available only for integer.

LIGO-G000346-00-E GWDAW-2000 B. Mours Data Compression 4 Data Compression Performances

● Numbers from the November E2 Ligo run Full Frames Reduced' Frames Raw Size 3.2 Mb/s 1.5 MB/s Size after gzip + Zero supp. 1.7Mb/s 1.0 Mb/s Fraction of float 31% 64% Compression ratio for short 2.16 1.98 Compression ratio for float 1.33 1.33 gzip speed (float) 2Mb/s 2Mb/s Zero supress speed (short) 8Mb/s 8Mb/s Speed measured on a Sun Ultra 10. ● Remarks: » Poor speed and compression ratio for float » People want floating points » More floating points than originally foreseen

LIGO-G000346-00-E GWDAW-2000 B. Mours Data Compression 5 New Compression for float?

● Principle: convert float to integer ● Method:

» Differentiate the data and digitize the differences: (si+1- si)/k » Round off is done by checking that the rebuild data do not diverge from the original date. ● Data saved » First value, the differences convert to an integer, a scaling factor. ● One parameter: number of bits to store the integer ● Compression rate: 32/number of bits » Example: If result stored on 16 bits: compression of 2 ● Speed: Fast: 20 Mbytes/s (if result stored on 16 bits)

LIGO-G000346-00-E GWDAW-2000 B. Mours Data Compression 6 Compression for float: Noise

H2:LSC-AS_Q Original spectrum noise for 8 bits 6 10 noise for 12 bits noise for 16 bits noise for 16 bits, no integration check 5 10

4 10

3 10

2 10

10

1

-1 10

2 3 10 10 10frequency[Hz]

LIGO-G000346-00-E GWDAW-2000 B. Mours Data Compression 7 Noise for other channels

H2:SUS-ETMX_SENSOR_LR H2:LSC-MICH_CTRL H2:ASC-ITMY_OPLEV_PITCH

4 Original spectrum 4 10 10 float to 8 bits 5 3 10 10 float to 12 bits 3 4 2 10 10 10 float to 16 bits

3 10 10 2 10 2 10 1

-1 10 10 10

-2 1 10

1 -1 -3 10 10

-4 -2 10 10 200 400 600 800 1000 1000 2000 3000 4000 5000 6000 7000 8000 200 400 600 800 1000 frequency[Hz] frequency[Hz] frequency[Hz]

H0:PEM-EX_V2 H0:PEM-BSC5_MIC H2:IOO-MC_REFLPD

6 6 10 3 10 10

5 5 10 10 2 10

4 4 10 10 10 3 10 3 10 1 2 10 2 10 -1 10 10

10 -2 1 10

-1 1 10 200 400 600 800 1000 200 400 600 800 1000 200 400 600 800 1000 frequency[Hz] frequency[Hz] frequency[Hz]

LIGO-G000346-00-E GWDAW-2000 B. Mours Data Compression 8 Noise for other channels

H2:SUS-ETMX_SENSOR_LR H2:LSC-MICH_CTRL H2:ASC-ITMY_OPLEV_PITCH Original spectrum 6 float to 8 bits 10 4 4 10 10 float to 12 bits 5 10 3 float to 16 bits 10

4

3 2 10 10 10

3 10 10 2 10 2 10 1

-1 10 10 10 -2 1 10

-1 -3 1 10 10

-2 -4 10 10

2 3 2 3 2 3 10 10 frequency[Hz]10 10 10 frequency[Hz]10 10 10 frequency[Hz]10

H0:PEM-EX_V2 H0:PEM-BSC5_MIC H2:IOO-MC_REFLPD

6 6 10 3 10 10

5 5 10 10 2 10

4 10 4 10 10 3 10 3 10 1 2 10 2 10 -1 10 10

10 -2 1 10

-1 1 10 2 3 2 3 2 3 10 10 frequency[Hz]10 10 10 frequency[Hz]10 10 10 frequency[Hz]10

LIGO-G000346-00-E GWDAW-2000 B. Mours Data Compression 9 Summary

● The best ‘compression’ is to record only what you need: » right channel, right frequency, right type (integer are better than float) ● Existing tools » Work OK short integer. » Poor on float. ● Simple for float seems possible » Small white noise introduced. » Fast. » Data could be stored in 8 to 16 bits. » But not as good as if the data were stored as integer (8 bits enough). ● More aggressive compression? see S. Klimenko talk

LIGO-G000346-00-E GWDAW-2000 B. Mours Data Compression 10