Noname manuscript No. (will be inserted by the editor)

Polynomial for large-scale physics experiments

Pierre Aubert · Thomas Vuillaume · Gilles Maurin · Jean Jacquemier · Giovanni Lamanna · Nahid Emad

Received: date / Accepted: date

Abstract The new generation research exper- ation ground-based high-energy gamma ray ob- iments will introduce huge data surge to a con- servatory, Cherenkov Telescope Array (CTA), tinuously increasing data production by current requiring important compression performance. experiments. This data surge necessitates effi- Stand-alone, the proposed compression method cient compression techniques. These compres- is very fast and reasonably efficient. Alterna- sion techniques must guarantee an optimum tively, applied as pre-compression algorithm, it tradeoff between compression rate and the cor- can accelerate common methods like LZMA, responding compression /decompression speed keeping close performance. ratio without affecting the data integrity. This work presents a al- Keywords Big data · HPC · lossless compres- gorithm to physics data generated by sion · white noise Astronomy, Astrophysics and Particle Physics experiments. The developed algorithms have been tuned 1 Introduction and tested on a real use case : the next gener-

P. Aubert · T. Vuillaume · G. Maurin · J. Several current and next generation experimen- Jacquemier · G. Lamanna tal infrastructures are concerned by increasing Univ. Grenoble Alpes, Univ. Savoie Mont Blanc, volume of data that they generate and manage. CNRS, LAPP, 74000 Annecy, France This is also the case in the Astrophysics and As- arXiv:1805.01844v1 [cs.NA] 3 May 2018 E-mail: [email protected] troparticle Physics research domains where sev- P. Aubert · N. Emad eral projects are going to produce a data del- Laboratoire d’informatique Parallélisme Réseaux Al- gorithmes Distribués, UFR des Sciences 45 avenue uge of the order of several tens of Peta-Bytes des États-Unis 78035 Versailles (PB) per year [1] (as in the case of CTA) up to P. Aubert · N. Emad some Exa-Bytes (as for the next generation as- Maison de la Simulation, Université de Versailles tronomical radio observatory SKA[2]). Such an Saint-Quentin-en-Yvelines, USR 3441 CEA Saclay increasing data-rate implies considerable tech- 91191 Gif-sur-Yvette cedex nical issues at all levels of the data flow, such 2 P. Aubert, T. Vuillaume, J. Jacquemier, G. Maurin, G. Lamanna & N. Emad as data storage, processing, dissemination and ods. Section4 reports the improvement ob- preservation. tained from our best polynomial compression The most efficient compression algorithms method on given distributions and CTA data generally used for pictures (JPEG), [14]. Section5 gives further details about com- (H264) or music (MP3) files, provide compres- pression quality. In section6, some concluding sion ratios greater than 10. These algorithms remarks and future plans will be given. are lossy, therefore not applicable in scientific context where the data content is critical and inexact approximations and/or partial data dis- 2 Motivation carding are not acceptable. In the context of this work, we focus on compression methods to re- As the data volumes generated by current spond to data size reduction storage, handling, and upcoming experiments rapidly increase, and transmitting issues while not compromising the transfer and storage of data becomes an the data content. economical and technical issue. As an exam- Following types of lossless compression ple, CTA, the next generation ground-based methods are applicable for aforementioned sit- gamma-ray observatory, will generate hundreds uations. LZMA [3], LZ78 [4], [5], PB of data by 2030. The CTA facility is based [6], Zstandard [7] or the Huffman algorithms are on two observing sites, one per hemisphere and often employed because they provide the best will be composed of more than one hundred tele- compression ratio. The compression speeds of scopes in total. Each of them is equipped with these methods however impose significant con- photo-sensors equipping the telescopes’ cameras straints considering the data volumes at hand. and generating about two hundred PB/year of Characters lossless compression, CTW uncompressed raw data that are then reduced () [8], LZ77 [9], LZW on sites after data selection conditions to the or- [10], Burrows-Wheeler transform, or PPM [11], der of the PB/year off-site data yield. The CTA cannot be used efficiently on physics data as pipeline thus implies a need for both lossy and they do not have the same characteristics as lossless compression, and the amount of lossy text data, like the occurance or repetition of compression should be minimized while also en- characters. Other experiments have recently suring good data reading and writing speed. solved this data compression issue [12], [13] for The writing speed needs to be close to real-time, smaller data rates. since there is limited capacity on site to buffer With the increasing data rate, both the com- such large data volumes. Furthermore, decom- pression speed and ratio have to be improved. pression speed is also an issue; the whole cumu- This paper primarily addresses the data com- lated data are expected to be reprocessed yearly, pression challenges. In this paper, we propose a which means that the amount of data needed polynomial approach to compress integer data to be read from disk (, decompressed) and pro- dominated by a white noise in a shorter time cessed will grow each year (e.g. 4 PB, 8 PB, 12 than the classical methods with a reasonable PB, ...). compression ratio. This paper focuses on both In CTA, as in many other experiments, the compression ratio and time because the de- the data acquired by digitization can be de- compression time is typically shorter. scribed by two components: a Poissonian dis- The paper is organized as follows. Section tribution representing the signal, dominated by 2 explains some motivations. Section3 de- a Gaussian-like distribution representing the scribes our three polynomial compression meth- noise, which is most commonly white noise. As Polynomial data compression for large-scale physics experiments 3

values in the same integer and compute them back.

Fig. 1 Example of analog signal digitization in most Fig. 2 Illustration of the reduction principle. The physics experiments. In many cases the white noise upper line represents the data (different colours for (a Gaussian distribution) dominates the signal (gen- different values). In the second line, the orange blocks erally a Poissonian distribution). So, the biggest part represent the changes between the different values to of the data we want to compress follows a Gaussian compress. The last line shows the compressed data distribution. (as they are stored). First, the minimum value of the data, next, the base b = max − min + 1, which de- fines the data variations set, Z/bZ, finally the data shown in figure1, the noise generally signifi- variations. Several data can be stored in the same cantly dominates the searched signal. unsigned int and only the changes between the data are stored. The common parameters like the range In this paper, we propose a compression of the data (minimum and maximum or compression algorithm optimised on experimental situation base) are stored only once. with such characteristics, Gaussian distribution added to a Poissonian one. Furthermore, in order to respond to time re- quirement and allow for almost real-time execu- tion the proposed solution can be also combined with the most powerful known compression al- 3.1 Basic compression method gorithms such as LZMA to increase tremen- ∈ N dously its speed. Considering a n elements data vector, v N , its minimum, vmin and its maximum vmax de- fine its associated ring. If the data ring is smaller 3 The polynomial compression than the unsigned int ring, it is possible to store several values in one unsigned int. The smaller An unsigned int range, q0, 232q defines a math- is the base, the higher is the compression ra- 32 ematical set Z/dZ, called ring, where d = 2 . tio. As the data are in vmin, vmax , the range J K The digitized data also define a ring, in this between 0 and vmin is useless. Therefore, the case, the minimum is vmin and the maximum data can be compressed by subtracting the min- is vmax so the corresponding ring is defined as imum value, forming a smaller base. The mini- Z/bZ with b = vmax − vmin + 1. In many cases mum can be stored once before the compressed b < d, so, it is possible to store several pieces of data. The compression base B is defined by : data in the same unsigned int (see in figure2). B = vmax − vmin + 1. With this base we are able This compression can be made by using a poly- to store (vmax − vmin) different values. The com- nomial approach. The power of a base is given pression ratio, p, is given by the number of bases by the values range. This allows to add different B that can be stored in one unsigned int (in 4 P. Aubert, T. Vuillaume, J. Jacquemier, G. Maurin, G. Lamanna & N. Emad

Fig. 4 This figure shows how the data of the vec- tor v are stored in the packed vector. The first line gives the base used to store the values, the second Fig. 3 Illustration of the advanced reduction. The line shows the variables used to store the values with upper line represents the data (different colours for respect to their base. To increase the compression ra- different values). In the second line, the orange blocks tio we need to split the last base B in to base R and represent the changes between the different values to R0 in order to use the storage capacity of an unsigned compress. The last line shows the compressed data int as much as we can. The values on the left of an (as they are stored). First, the minimum value of the unsigned int are stored with a low power of the base data, next, the base b = max − min + 1, which defines B. The values on the right of an unsigned int are the data variations set, Z/bZ, and finally the data stored with a high power of the base B. variations. The storage space is optimized by avoid- ing useless gaps between data. With this method there is no useless space to store the compressed data. used to less data in the same unsigned int, into two other bases, R and R0 in order to have 0 q0, 232q): B ≤ R × R . In this case, the base R is stored in the current packed unsigned int and the base  ln 232 − 1  p = (1) R0 is stored in the next one (see in figure4). ln B This configuration ensures a more efficient data The compressed elements, sj , are given by : order for CPU data pre-fetching at the decom- pression time, in order to ensure decompression p Õ n faster than compression. s = v × Bi−1 for 1 ≤ j < (2) j i+p×(j−1) p The data are accumulated from the high- i=1 est exponent of the base B to the lowest. This A polynomial division can be used to un- ensures the decompression will produce uncom- compress the data. pressed contiguous data. This splitting stores a value to be com- pressed, v, in two bases, R and R0 with two 3.2 Advanced compression method variables r and r 0. The variables r and r 0 are stored in two consecutive packed element (un- The inconvenience of the basic compression signed int). method is the unused space at the end of each The calculation of the bases R and R0 is pos- packed unsigned int (see in figure2). The ideal sible when the number of bases B that can be case is the one that has no unused space when stored in an unsigned int is known. The number storing the compressed data illustrated in fig- of bases B that can be stored in the first packed ure3 which avoid unused space. The advanced unsigned int, p , is given by the following equa- polynomial compression tends to become ideal 1 tion : case. However, minimizing the time for read and write provides a faster compression and decom-  ln 232 − 1  pression speed. p1 = (3) Compression ratio can be improved by split- ln B ting the last base (see figure2 and figure3), The split base R1 is given by : Polynomial data compression for large-scale physics experiments 5

0 0 So R1 × R1 ≥ B. Each base Ri and Ri are as- sociated to a stored value r and r 0 respectively.  232 − 1  i i R1 = (4) The first packed element s1 can be written as Bp1 follow : The R1 base must be completed, to store an element e ∈ 0, B , by : p J J Õ1 × © p1−k ª s1 = r1 + R1 ­ vk B ® (6)   1 0 B k= ,k,p1 R1 = + (1 if B mod R0 , 0) (5) « ¬ R1 Where :

  0 vp1 r1 = (7) R1 0 r1 = vp1 − r1 × R1 (8)

The value r1 is associated to the base R and 0 0 the value r1 is associated to the base R . The number of bases B that can be stored in the second packed element, p2, is given by :

$ % ln 232 − 1 − lnR0 p = 1 (9) 2 ln B

The rest split base R2 can be written as :

 232 − 1  R2 = (10) 0 p2 R1B

The base R2 the second packed element can be calculated :

s2 = r2 p Õ1 +R × © v Bp1−k + r 0 Bp1 ª (11) 2 ­ p0+k 1 ® k=1,k,p1 « ¬ 0 The equation5 can be used to calculate R2.

The compression of a whole vector can be done by using a mathematical series to calculate the split base for each packed element. Assum- 0 ing the base R0 = 1 for the first step, the math- ematical series used to compress an entire vec- tor of unsigned int can be written as follow (for 0 < i ≤ np, where np is the number of packed elements) : 6 P. Aubert, T. Vuillaume, J. Jacquemier, G. Maurin, G. Lamanna & N. Emad

$ % ln 232 − 1 − lnR0 p = i−1 i ln B  232 − 1  Ri = 0 pi Ri−1B   0 B Ri = Ri i Õ qi = i − 1 + pk or 0 if i = 0 k=1,k,i   0 vqi +pi ri = Ri Fig. 5 This figure illustrates the typical signal dis- 0 tribution obtained in several of the cameras used in ri = vqi +pi − ri × Ri CTA [14]. p Õi s = r + R × © v Bpi −k + r 0 Bpi ª i i i ­ qi +k+1 i−1 ® k=1,k,pi sion method, in the following Monte Carlo simu- « ¬ Where pi is the number of bases B that can lated distributions will be used. These distribu- th 0 tions are in agreement with the measured data be stored in the i packed element, Ri and Ri 0 are the split base, ri and r their corresponding from Cherenkov cameras (see figure5). values, qi is used to know how many elements have been packed until the ith packed element, th 4.1 Simulation of the distribution finally si is the value of the i packed element. The data can be described by a random gaus- 3.3 Blocked compression method sian distribution with a given standard devia- tion (the white noise in the cameras’ signals) We observed that smaller the signal range, more and by adding a uniform distribution in the efficient is the compression. The advanced com- given camera’s signal range (the physics signal). pression method presented above is particularly Consider the set of the camera pixels distribu- efficient to compress white noise with small tion values, A : spread. Conversely, if the gaussian noise or the N−a a poissonian signal is spread out, the efficiency de- A (µ, σ, x, y, a, N) = N (µ, σ) ∪ U (x, y) (12) creases. However, the efficiency can be improved Where : by dividing the vector into less items to dimin- ish the impact by the higher values on the global – µ : gaussian noise mean compression ratio. – σ : gaussian noise standard deviation The block efficiency, and their size determi- – (x, y) : range of uniform signal value nation will be discussed in the section 4.2. – a : number of values in the uniform distri- bution (signal) – N : total number of values in the vector 4 Experiments and analysis (N is the normal distribution, the simulated In order to test and evaluate the performance noise, and the U describes an uniform distri- of the previously described polynomial compres- bution, the simulated signal.) Polynomial data compression for large-scale physics experiments 7

An example of simulation is presented on fig- the blocks are too long, 25% of lower compres- ure6. sion for σ = 500 and 30% for σ = 1000. In this case using blocks of 154 elements allows a com- pression ratio of 2.47054 which is larger than 17% to the basic compression (2.10361).

Fig. 6 Typical simulated distribution used to improve the data reduction in the set A (3 000, 500, 2 000, 45 000, 9, 1855). In this case, the gaussian distribution represents 99% of the pix- els’ values and the uniform distribution represents 1% of this distribution.

In this paper we have tested the distribu- tion A with µ = 3000, σ ∈ [100, 10 000], x = 2 000, y ∈ 20 000, 100 000 , a ∈ 1, 1 000 and N ∈ 1855, 10J 000 . K J K J K 4.2 Polynomial reduction on given distributions

The implementation of blocked polynomial re- Fig. 7 Top panel : illustration of the com- duction has been tested on given distributions. pression ratio versus the number of element in This test determines the influence of the distri- the blocks used to compressed a vector of data. bution parameters on the compression ratio. The red points (+) give the compression ratio As the polynomial compression uses statis- for a distribution with σ = 500 and range = 45 000, in A (3 000, 500, 2 000, 45 000, 4, 10 000). The tical properties to compress data, the test can blue points (∗) give the compression ratio for a only be done with a set of distributions. The fig- distribution with σ = 500 and range = 100 000, ure7 shows the compression ratio for 1000 vec- in A (3 000, 500, 2 000, 100 000, 4, 10 000). The green tors with A (3000, 500, 2000, 45 000, 4, 1855) to points (×) give the compression ratio for a dis- tribution with σ = 1 000 and range = 45 000, compute the variations (red curve). The gaus- in A (3 000, 1 000, 2 000, 45 000, 4, 10 000). The tails of sian σ variation has a high influence on the fi- the plots give the compression ratio of the advanced nal compression ratio, of the order of 25% from polynomial reduction method. Bottom panel : the σ = 1000 to σ = 500 in the best block size same plot zoomed. case. The signal range influence is lighter, 5% Figure8 shows that the signal/noise ratio or 10% depending on the block size, and 5% for has high influence on the compression ratio if the best block size. The block size choice is im- the spread is not too high. On the contrary, if portant too. The compression ratio is weaker if the noise spread is important (green curve), the 8 P. Aubert, T. Vuillaume, J. Jacquemier, G. Maurin, G. Lamanna & N. Emad

pixels/photo-sensors of the camera itself. An event data-file is then composed of a header, used to describe properties like its timestamp, plus the recorded camera data, e.g. either the integrated signal from all pixels and/or the dy- namical evolution of the signal in time (wave- form). The event data file can have a size of typically several tens of thousands bytes. A se- lection on the pixels to be saved will likely be applied in the acquisition pipeline in order to reduce the final data rate. Among the various specifications that have Fig. 8 Illustration of the signal/noise ratio influence to be fulfilled by the CTA data format, each on the final compression ratio for a vector of 10 000 data file has to be readable by part to enable values. The red points (+) give the compression ratio for a distribution with σ = 500 and range = 45 000. the access to its header without requiring a full The blue points (∗) give the compression ratio for decompression step. Therefore, a blocked com- a distribution with σ = 500 and range = 100 000. pression is allowed. The green points (×) give the compression ratio for a Our test was performed on CTA Monte- distribution with σ = 1 000 and range = 45 000. The compression ratio is constant from 10%. Carlo Prod 3 [15] files, which simulate telescopes observing the Cherenkov light emitted by par- ticles’ showers in the atmosphere and used to signal/noise ratio has less influence on the final characterize the scientific output of CTA in ex- compression ratio. perimental conditions, thus they are reasonably realistic. The CTA Monte Carlo data are con- 4.3 Polynomial reduction on CTA data verted into a specific high performance data for- mat, which stores pixels’ values in 16 bits to en- In the previous section 4.2 we have described the able fast computation. The test files contain 624 compression ratios obtained with the blocked telescopes, 22 000 images, 7.6 GB data in wave- polynomial reduction applied on modelled/sim- form mode and 474 MB of integrated signal. ulated data distributions. Such an improvement Each image is composed of the lighted pixels cannot reflect properly the compression ratio concerned by both the noise as well as by the with physics data, since they result typically in genuine signal and having an almost elliptical a superposition of several distributions coming shape (see figure9). from several photo-sensors (e.g. pixels) read si- In the following we present the way to adapt multaneously. the polynomial compression method to the CTA We have therefore tested our compression prerequisites. All tests are executed on an Intel method on Monte Carlo simulated CTA-like core i5 clocked at 2.67 GHz with SSE4 instruc- data (i.e. Cherenkov light emitted by atmo- tions without SSD disk. spheric electromagnetic showers and captured by cameras on telescopes). Only shower pic- 4.3.1 Test on waveform CTA data tures registered in stereoscopy by more than one telescope are recorded. Each telescope’s The waveform-data of CTA record the electro- camera produces individually a file containing magnetic showers’ expansion. Thus, each pixel its own picture (called also an “event”) result- has values in time. The number of values de- ing from the different signals registered by all pends on the camera type. Each value is digi- Polynomial data compression for large-scale physics experiments 9

Table 1 The polynomial compression ratio, time and compressed file size compare the LZMA (best com- pression existing). The tested file is the full waveform simulation of the PROD_3 (run 3998) of the CTA experiment. The used CPU was a Intel core i7 M 560 with 19 GB of RAM installed with a Fedora .

Compression Compression File size (GB) Decompression Compression Decompression ratio Elapsed Time Elapsed Time Elapsed Time RAM Elapsed Time RAM No compression 1 0 7.6 0 0 0 Advanced Polynomial 2.71 5 min36.025s 2.8 2 min35 s 2 min56 s 2 min30 s Reduction BZIP2 2.62 19 min 18.247 s 2.9 2 min23.676 s - - LZMA (-mx=9 -mfb=64 -md=200m) 6.52 2 h14 min44 s 1.166 1 min49.689 s 2 h03 min44 s 2 min03 s LZMA (-mx=1 -mfb=64 -md=32m) 5.88 14 min00 s 1.293 2 min10 s 15 min42 s 2 min09 s Poly + LZMA (9 16 32) 5.02 10 m49 s 1.513 2 min13 s 4 min57 s 2 min15 s tized in 12 or 16 bits and is stored in 16 bits for BZIP2 and LZMA algorithms over the whole computing reasons. file. Our method is 3.5 times faster than the BZIP2 algorithm and offers a better compres- sion. For the LZMA algorithm, the program is used on two tests.

Fig. 9 Illustration of the ellipse shape of a parti- cles shower recorded by a camera in the CTA Monte- Carlo. The color scale represents the number of pho- Fig. 10 Comparison of the compression ratio for dif- tons detected in a pixel. ferent compression block sizes on the CTA PROD_3 To improve waveform compression and en- Monte-Carlo. able High Performance Computing (CPU data First we investigate to reach the best com- pre-fetching and vectorization) we choose to pression ratio with the commande line 7z a -t7z - store them into matrices. The matrix element m0=lzma -mx=9 -mfb=64 -md=200m -ms=on. th th Mi, j corresponds to the i time of the j pixel Thus, the test file is compressed with a com- of the current camera. The row alignment en- pression ratio of 6.52 in 2 h14 min 44 s. We also ables a better compression because it increases investigate the fast compression mode of LZMA the number of sequences of similar values. This (7z a -t7z -m0=lzma -mx=1 -mfb=64 -md=32m configuration enables the optimisation of wave- -ms=on) and we obtain a compression ratio of forms’ pictures computing. 5.88 in 14 m 00 s. Finally we combine the poly- Our polynomial compression reduces the nomial compression with LZMA and obtain a test file of 7.6 GB in a file of 2.8 GB (compres- compression ratio of 5.02 in only 10 m 49 s. The sion ratio of 2.71) in 5 min 36 s. The table1 com- decompression times of the different algorithms pares our results with classical methods. We test are similar. However, compression/decompres- 10 P. Aubert, T. Vuillaume, J. Jacquemier, G. Maurin, G. Lamanna & N. Emad

Table 2 The polynomial compression ratio, time and compressed file size compare the LZMA (best com- pression existing). The tested file is the simulation the PROD_3 (run 497) of the CTA experiment. The combination of our advanced polynomial compression and LZMA allows a compression as good as a pure LZMA compression but 19 times faster. The used CPU was a Intel core i5 M 560 with 8 GB of RAM installed with an 16.4. Compression Elapsed File size (MB) Compression Decompression ratio Time Elapsed Time RAM Elapsed Time RAM

No compression 1 0 474 0 0 Advanced Polynomial 3.74 3.7 s 127 0.9 s 0.9 s Reduction BZIP2 4.69 1 min 48 s 101 1 min0 s 6.23 s LZMA (7z) 4.84 7 min 48.636 s 98 1 min 18 s 9.28 s Advanced Polynomial 4.84 24.646 s 98 1 min 20 s 11 s Reduction + LZMA sion over a whole file are not in agreement with The difference with the simulated distribu- the data requirement (events or blocks have tion comes from the 7 types of cameras. Each to be compressed separatelly). A solution is to camera has a typical ellipse size. If one block compress several events packed in blocks with a contains the full ellipse signal, it is less com- higher granularity level. With LZMA (native), pressed compared to others. The result is a bet- this method archieves a compression ratio of 6 ter global compression. by compressing 300 events per block which rep- Table2 compares polynomial compression resents less than 0.02 second of signal for the with LZMA and BZIP2 algorithms. LST-CAM. In this case, each block contains ap- The BZIP2 algorithm provides a better com- proximately 100 MB of data (depending on the pression ratio (4.69) but in 1 min48 s (29 times camera). slower than our compression). The best com- pression ratio is obtained with the LZMA algo- 4.3.2 Test on integrated CTA data rithm (4.84) but in 7 min48 s (126 times slower than our compression). The integrated data are obtained by the reduc- tion of the waveform signal. This reduction can By combining our advanced polynomial re- be performed on all the pixels’ waveforms or duction with the LZMA compression we obtain on several pixels’ waveforms. In our case, we the same compression ratio as pure LZMA, but reduced matrices of section 4.3.1 in vectors to 19 times faster. Moreover, the use of the poly- enable High Performance Computing. The pro- nomial reduction allows the LZMA to keep its duced vector can be described by 24 or 32 bits flat profile because it packs small values in high data depending on the cameras. values. So the values average increases and the The polynomial compression archieves a bytes profile becomes more flat. compression ratio of 3.74 in 3.7 s on a 474 MB The polynomial reduction allows also a bet- test file. ter compression ratio than a classical bit-shifted The figure 10 shows the compression ratios compression because the left space in an un- obtained with different compression block size. signed int is used. The plot variations denote the different com- Extrapolating to CTA yearly data rate of pression ratios from the different cameras in the 4 PB [16], the usage of the LZMA algorithm in file. Statistically, the camera data have not the this case would require more than 1750 core.year same compression ratio. This is why there are only for the compression. fluctuations. Polynomial data compression for large-scale physics experiments 11

5 Bytes occurrences

For further test purpose one can compare the distributions of the byte-values as in the initial file and in the compressed one with the purpose of verifying that the compression algorithm has not altered the physical distributions. At this point the file cannot be further compressed with a lossless compression [3]. Figure 11 shows the different byte-value dis- tributions in the initial file and in the file com- pressed with a polynomial reduction, LZMA and BZIP2 for integrated test files. This figure shows the polynomial compres- sion smoothens the profile. The best compres- sion is obtained with the LZMA compression Fig. 11 Comparison of the different values of bytes on a file compressed with a polynomial reduc- in a binary file. In red, the profile of the uncom- pressed file (CTA PROD_3 Monte-Carlo) In green, tion because its profile is flat. The combination the profile of the only polynomial reduced file. In of the polynomial compression and the BZ2 al- blue, the profile of the corresponding compressed file gorithm does not provide a better compression with the polynomial reduction and LZMA compres- ratio or a faster compression speed. sion. In purple, the profile of the corresponding com- pressed file with the polynomial reduction and GZIP compression. In cyan, the profile of the corresponding compressed file with the advanced polynomial reduc- tion and LZMA compression. 6 Conclusion

In this article, we introduced a new lossless com- LZMA but more efficient than BZIP2 on wave- pression algorithm to compress integers from forms data. digitized signals and dominated by a white The integrated data compression is very ef- noise. ficient and fast. Used as a pre-compression for This method is very fast, helps CPU data LZMA, we obtain the same compression ratio pre-fetching and eases vectorization. It can be as pure LZMA but in a compression duration integrated in each data format that deals with 19 times shorter. tables or matrices of integers. This method com- The method’s simplicity offers easy devel- presses preferentially integers but an adaptation opment in many languages and the possiblity of this algorithm can enable floating-point data to be used on simple embedded systems, or to compression with fixed precision that returns in- reduce the data volume produced by high sen- tegers. It matrices and tables sepa- sitive captors on FPGA. It can be also used rately in order to keep a similar data structure as pre-compression of stronger methods (like between compressed and uncompressed data. LZMA) and accelerate it. The decompression is roughly twice faster than the compression. This method can also be vec- torized to improve its speed. Acknowledgements This work is realised under the Astronomy ESFRI and Research Infrastructure Tests on CTA Monte-Carlo data show that Cluster (ASTERICS project) supported by the Eu- the polynomial compression is less efficient than ropean Commission Framework Programme Horizon 12 P. Aubert, T. Vuillaume, J. Jacquemier, G. Maurin, G. Lamanna & N. Emad

2020 Research and Innovation action under grant agreement n. 653477.

References

1. T. Berghofer et al. Towards a Model for Com- puting in European Astroparticle Physics. 2015. 2. P. J. Hall (ed). An ska engineering overview. SKA Memorandum 91, 2007. 3. Abraham Lempel and Jacob Ziv. Lem- pel–Ziv–Markov chain algorithm. 1996. 4. Jacob Ziv. A constrained-dictionary version of LZ78 asymptotically achieves the finite- state compressibility for any individual sequence. CoRR, abs/1409.1323, 2014. 5. Julian Seward. Burrows–wheeler algorithm with huffman compression. 1996. 6. Jean-loup Gailly and Mark Adler. Gnu . 1992. 7. Zstandard. 2015. 8. Y.M. Shtarkov F.M.J. Willems and T.J. Tjalkens. The context-tree weighting method: basic properties. IEEE Transactions on Infor- mation Theory, 41, 2002. 9. Abraham Lempel and Jacob Ziv. Lempel–Ziv lossless data compression algorithms. 1977. 10. Jan Platos and Jiri Dvorský. Word-based text compression. CoRR, abs/0804.3680, 2008. 11. I. Clear and I. Witten. Data compression us- ing adaptive coding and partial string match- ing. IEEE Transactions on , COM-32, 1984. 12. M. L. Ahnen et al. Data compression for the first g-apd cherenkov telescope. 2015. 13. William Pence, Rob Seaman, Richard L. White. A tiled-table convention for compressing fits bi- nary tables. 2010. 14. CTA Consortium. Introducing the CTA concept. Astroparticle Physics, 43:3 – 18, 2013. 15. T. Hassan, L. Arrabito, K. Bernlör, J. Bregeon, J. Hinton, T. Jogler, G. Maier, A. Moralejo, F. Di Pierro, M. Wood, and f. t. CTA Consortium. Second large-scale Monte Carlo study for the Cherenkov Telescope Array. ArXiv e-prints, Au- gust 2015. 16. CTA Consortium. CTA data management tech- nical design report version 2.0. 2016. The source code of the polynomial compres- sion method discussed in this work is avaliable under https://gitlab.in2p3.fr/CTA-LAPP/PLIBS_ 8.