1 compression: a survey

Giacomo Chiarot, Claudio Silvestri

Abstract—The presence of smart objects is increasingly widespread and their ecosystem, also known as Internet of Things, is relevant in many different application scenarios. The huge amount of temporally annotated data produced by these smart devices demand for efficient techniques for transfer and storage of time series data. Compression techniques play an important role toward this goal and, despite the fact that standard compression methods could be used with some benefit, there exist several ones that specifically address the case of time series by exploiting their peculiarities to achieve a more effective compression and a more accurate decompression in the case of lossy compression techniques. This paper provides a state-of-the-art survey of the principal time series compression techniques, proposing a taxonomy to classify them considering their overall approach and their characteristics. Furthermore, we analyze the performances of the selected algorithms by discussing and comparing the experimental results that where provided in the original articles. The goal of this paper is to provide a comprehensive and homogeneous reconstruction of the state-of-the-art which is currently fragmented across many papers that use different notations and where the proposed methods are not organized according to a classification.

Index Terms—Time series, Data Compaction and Compression, Internet of Things, State-of-the-art Survey, Sensor Data, Approximation, and Data Abstraction. !

1 INTRODUCTION

IME series are relevant in several different contexts in the IoT). Further, these algorithms take advantage of the T and Internet of Things ecosystem (IoT) is among the peculiarities of time series produced by sensors, such as: most pervasive ones. IoT devices, indeed, can be found • Redundancy: some segments of a time series can in different applications, ranging from health-care (smart frequently appear inside the same or other related wearables) to industrial ones (smart grids) [2], producing time series; very large amount of data. For instance, a single Boeing 787 • Approximability: sensors in some case produce time fly can produce about half a terabyte of data from sensors series that can be approximated by functions; [26]. In those scenarios, characterized by high data rates and • Predictability: some time series can be predictable, volumes, time series compression techniques are a sensible for example using deep neural network techniques. choice to increase the efficiency of collection, storage and analysis of sensor data. In particular, the need to include in the analysis both information related to the recent and to the past history of the data stream, leads to consider data compression as a solution to optimize space without losing the most important information. A direct application of time series compression, for example, can be seen in Time Series Management Systems (or Time Series Database) in which compression is one of the most significant step [15]. There exists an extensive literature on data compression

arXiv:2101.08784v1 [cs.DB] 21 Jan 2021 algorithms, both on generic purpose ones for finite size data and on domain specific ones, for example for images and for video and audio data streams. This survey aim at providing an overview of the state-of-the-art in time series compres- sion research, specifically focusing on general purpose data compression techniques that are either developed for time series or working well with time series. The algorithms we chose to summarize are able to deal with the continuous growth of time series over time and suitable for generic domains (as in the different applications Fig. 1: Visual classification of time series compression algo- rithms • Both authors are with the Department of Environmental Sciences, Informatics, and of Ca’ Foscari University of Venice. E-mail: The main contribution of this survey is to present a [email protected], [email protected]. reasoned summary of the state-of-the art in time series com- • Claudio Silvestri is also with the European Center For Living Technology, pression algorithms, which is currently fragmented among hosted by Ca’ Foscari University of Venice. several sub-domains ranging from databases to IoT sen- sor management. Moreover, we proposes a taxonomy of 2 time series compression techniques based on their approach 2.2 Compression (dictionary-based, functional approximation, autoencoders, Data compression, also known as source coding, is defined in sequential, others) and their properties (adaptiveness, loss- [27] as ”the process of converting an input data stream (the less reconstruction, symmetry, tuneability), anticipated in source stream or the original raw data) into another data visual form in Figure 1 and discussed in Section 3, that will stream (the output, the bitstream, or the compressed stream) guide the description of the selected approaches. Finally, that has a smaller size”. This process can take advantage of we recapitulate the results of performance measurements the Simplicity Power (SP) theory, formulated in [39], in which indicated in the described studies. compression goal is to remove redundancy while having an The article is organized as follows. Section 2 provides high descriptive power. some definitions regarding time series, compression, and The decompression process, complementary to the com- quality indexes. Section 3 describes compression algorithms pression one is indicated also as source decoding, and tries and is structured according to the proposed taxonomy. Sec- to reconstruct the original data stream from its compressed tion 4 summarize the experimental results found in the stud- representation. ies that originally presented the approaches we describe. A Compression algorithms can be described with the com- summary and conclusions are presented in Section 5. bination of different classes, shown in the following list:

2 BACKGROUND • Non-adaptive - adaptive: a non-adaptive algorithm that doesn’t need a training phase to work efficiently This section gives a formal definition of time series, com- with a particular dataset or domain since operations pression and quality indices. and parameters are fixed, while an adaptive one does; 2.1 Time series • Lossy - lossless: algorithms can be lossy, if the Time series are defined as a collection of data, sorted in decoder doesn’t return a result that is identical to ascending order according to the timestamp ti associated original data, or lossless if the decoder result is to each element. They are divided into: identical to original data; • Symmetric - non-symmetric • Univariate Time Series (UTS): elements inside the : a symmetric algo- collections are real values; rithm uses the same algorithm as encoder and de- • Multivariate Time Series (MTS): elements inside the coder, working on different directions, whereas non- collections are arrays of real values, in which each symmetric one uses two different algorithms. position in the array is associated to a time series In the particular case of time series compression, a com- features. pression algorithm (encoder) takes in input one Time Series 0 For instance, the temporal evolution of the average daily TS of size s and returns its compressed representation TS 0 0 price of a commodity as the one represented in the plot of size s , where s < s and the size is defined as the bits 0 in Figure 2 can be modeled as an univariate time series, needed to store the time series: E(TS) = TS . From the 0 whereas the summaries of daily exchanges for a stock (in- compressed representation TS , using a decoder, it is possible 0 cluding opening price, closing price, volume of trades and to reconstruct the original time series: D(TS ) = TS. If other information) can be modeled as a multivariate time TS = TSs then the algorithm is lossless, otherwise it is series. lossy. In Section 3, there are shown the most relevant categories of compression techniques and their implementation.

2.3 Quality indices To measure the performances of a compression encoder for time series, three characteristics are considered: compres- Fig. 2: Example of an UTS representing stocks fluctuations sion ratio, speed and accuracy. Compression ratio This metric measures the effective- Using a formal notation, time series can be written as: ness of a compression technique and it is defined as: m 0 TS = [(t1, x1),..., (tn, xn)], xi ∈ R (1) s ρ = (3) where n is the number of elements inside a time series s and m is vector dimension of multivariate time series. For where s0 is the size of the compressed representation and 1 univariate time series m = 1. Given k ∈ [1, n], we write s the size of the original time series. Its inverse ρ is named TS[k] to indicate the k-th element (tn, xn) of the time series compression factor. An index used for the same purpose is the TS. compression gain, defined as: A time series can be divided into segments, defined as 1 a portion of the time series, without any missing elements c = 100 log (4) g e ρ and ordering preserved: Speed The unit of measure for speed is cycles per byte TS = [(t , x ),..., (t , x )] (2) [i,j] i i j j (CPB) and is defined as the average number of computer where ∀k ∈ [i, j],TS[k] = TS[i,j][k − i + 1]. cycles needed to compress one byte. 3

Accuracy, also called distortion criteria, measures the of atoms length should guarantee a low decompression fidelity of the reconstructed time series respect to the orig- error and maximize the compression factor at the same inal. It is possible to use different metrics to determine time. Listing 1 shows how training phase works at high fidelity [28]: level: createDictionary computes a dictionary Pn 2 of segments given a dataset composed by time series and a (xi−xi) • MSE = i=1 Mean Squared Error: n √ threshold value th. • Root Mean Squared Error: RMSE = MSE 2 Pn x n i=1 i / Listing 1: Training phase • Signal to Noise Ratio: SNR = MSE 2 xpick 1 createDictionary(StreamS, Threshold th, int • Peak Signal to Noise Ratio: PSNR = MSE where segmentLength) { xpick is the maximum value in the original time 2 Dictionary d; series. 3 Segment s; 4 while (S.isNotEmpty()) { 5 s.append(S.read() ) ; 3 COMPRESSIONALGORITHMS 6 i f (s.length == segmentLength) { 7 i f (d.find(s, th)) { In this section we present the most relevant time series 8 d.merge(s) ; compression algorithms by describing in short summaries 9 } e l s e { 10 d . add ( s ) ; their principles and peculiarities. We also provide a pseudo- 11 } code for each approach, focusing more on style homogene- 12 s . empty ( ) ; ity among the pseudo-codes for the different methods than 13 } on the faithful reproduction of the original pseudo-code 14 } 15 return d; proposed by the authors. For a more detailed description 16 } we refer the reader to the complete details available in the original articles. Below the full list of algorithms described The find function searches in the dictionary if there exists in this section, divided by approach: a segment which is similar to the segment s with a distance lower than threshold th; a possible index of distance can 1) Dictionary-Based (DB): be MSE. If a match is found, the algorithm merges the 1.1. TRISTAN; segment s with the matched one in order to achieve gener- 1.2. CONRAD; alization. 1.3. A-LZSS; After the segment dictionary is created, the compression 1.4. D-LZW. phase takes in input one time series, and each segment is 2) Functional Approximation (FA) replaced with its key in the dictionary. If some segments are not present in the dictionary, they are left uncompressed, or 2.1. Piecewise Approximation (PPA); a new entry is added to the dictionary. Listing 2 shows how 2.2. Chebyshev Polynomial Transform (CPT); the compression phase works: the compress function takes 2.3. Discrete Wavelet Transform (DWT). in input a time series to compress, the dictionary created during the training phase, a threshold value and returns the 3) Autoencoders: compressed time series as a list of indices and segments. 3.1. Recurrent Neural Network Autoencoder (RNNA). Listing 2: Compression phase

1 compress (StreamS, Dictionary d, Threshold th, 4) Sequential Algorithms (SA): i n t segmentLength) { 4.1. Delta encoding, Run-length and Huffman 2 Segment s; 3 while (S.isNotEmpty()) { (DRH); 4 s.append(S.read() ) ; 4.2. Sprintz; 5 i f (s.length == segmentLength) { 4.3. Run-Length Binary Encoding (RLBE); 6 i f (d.find(s, th)) { 7 send(d.getIndex(s)) ; 4.4. RAKE. 8 } e l s e { 5) Others: 9 send ( s ) ; 10 } 5.1. Major Extrema Extractor (MEE); 11 s . empty ( ) ; 12 } 5.2. Segment Merging (SM); 13 } 5.3. Continuous Hidden Markov Chain (CHMC). 14 }

The compression achieved by this technique can be 3.1 Dictionary-Based (DB) either lossy or lossless, depending on the implementation. This approach is based on the principle that time series The main challenges for this architecture are to: share some common segments. These segments can be extracted into atoms, such that a time series segment can • maximize the searching speed to find time series be represented with a sequence of these atoms. Atoms are segments in the dictionary; then collected into a dictionary that associates to each atom • make the time series segments stored in the dictio- with an univocal key used both in the representation of nary more general as possible in order to minimize time series and to search efficiently their content. The choice the distance in the compression phase. 4

3.1.1 TRISTAN Time series are divided into segments and time window One implementation of the Dictionary-Based architecture is are set. For each window, correlation is computed between TRISTAN [22], an algorithm diveded into two phases: the each segment belonging to it and results are stored in a Rn×n learning phase and the compression phase. correlation matrix M ∈ where n is the segments Learning phase The dictionary used in this implemen- number of elements. tation can be created by domain experts that add typical Compression phase During this phase, segments are patterns or by learning a training set. To learn a dictionary sorted from the less correlated to the most correlated. To M from a training set T = [t1, . . . , tn] of n segments, the do this, correlation matrix is used. The metric used to following minimization problem has to be solved: measure how much one segment is correlated with all the others is the absolute sum of the correlations, computed as n X the sum of each row of M. D = arg min kwiD − tik2 D Knowing correlation information, dictionary is used i=1 (5) only to represent segment that are not correlated with Under the constraint: kw k ≤ sp i 0 others, as in TRISTAN implementation. While the others where D is the obtained dictionary, sp is a fixed parameter segments are represented solely using correlation informa- representing sparsity, wi is the compressed representation tion. of segment i, and wiD is the reconstructed segment. The Reconstruction phase The reconstruction phase starts meaning of this formulation is that the solution to be found with segment represented with dictionary atoms while the is a dictionary which minimize the distance between origi- others are reconstructed looking at the segment to which nal and reconstructed segments. they are correlated. The problem shown in equation 5 is NP-hard [22], thus This process is very similar to the one proposed in TRIS- an approximate result is computed. A technique for approx- TAN, with the sole difference that segments represented imating this result is shown in [20]. using the dictionary are managed differently than those that Compression phase Once the dictionary is built, the are not. compression phase consists in finding w such that: 3.1.3 Accelerometer LZSS (A-LZSS) s = w · D (6) Accelerometer LZSS is an algorithm built on top the LZSS where D is the dictionary, s is a segment and w ∈ {0, 1}k, algorithm [35] for searching matches [24]. A-LZSS algorithm and k the length of the compressed representation. An uses Huffman codes, generated offline using frequency dis- element of w in position i is 1 if the ai ∈ D is used to tributions. In particular, this techniques considers blocks of reconstruct the original segment, 0 otherwise. size s = 1 to compute frequencies and build the code: an Finding a solution for Equation 6 is a NP-hard problem element of the time series will be replaced by a variable and, for this reason, the matching pursuit method [21] is number of bits. Also larger blocks can be considered and, used to approximate the original problem: in general, having larger blocks gives better compression performances at the cost of larger Huffman code tables. s = arg minkwD − sk 2 The implementation of this techniques is show below: w (7) Under the constraint: ksk ≤ sp 0 A-LZSS algorithm

where sp is a fixed parameter representing sparsity. 1 compress (StreamS, int minM, Dictionary d, int Reconstruction phase Having a dictionary D and a Ln , int Dn) { 2 { compressed representation w of a segment s, it is possible foreach(s in S) 3 I, L = d.longestMatch(s, Ln, Dn); to compute s as: 4 i f(L < minM) { s = wD (8) 5 send(getHuffCode(s)) ; 6 } e l s e { 7 send((I, L)); 3.1.2 CORAD 8 s . skip (L) ; This implementation extends the idea presented in TRIS- 9 } 10 } TAN [16]. The main difference is that it adds autocorrelation 11 } information to get better performances in terms of compres- sion ratio and accuracy. The algorithm in Listing 3.1.3 takes in input a stream, a Correlation between two time series TSA and TSB is Huffman code dictionary, and other three integers values measured with the Pearson correlation coefficient: that are: Pn i (xi − x)(yi − y) • minM r = (9) : the minimum match length, which is asserted pPn 2pPn 2 i (xi − x) i (yi − y) to be minM > 0; Ln A B • Ln: determines the lookahead distance, as 2 ; where x is an element of TS , y is an element of TS Dn i n i n • Dn: determines the dictionary atoms length, as 2 . and x, y are mean values of the corresponding time series. This coefficient can be applied also to segments that have The longestMatch function returns the index I of the different ranges of values and r ∈ [−1, q] where 1 express found match and the length L of the match. If the length the maximum linear correlation, -1 the maximum linear of the match is too small, then the Huffman code represen- negative correlation and 0 no linear correlation. tation of s is appended to c, otherwise the index and the 5 length of the match are appended and the next L elements in input a time series S, a maximum polynomial degree ρ are skipped. and an error threshold . This implementation uses a brute-force approach, with complexity O(2Dn · 2Ln) but it is possible to improve over it by using hashing techniques. PPA algorithm 1 compress (int ρ , StreamS, float  ) { 3.1.4 Differential LZW (D-LZW) 2 Segment s; 3 while (S.isNotEmpty()) { The core of this technique is the creation of a very large 4 Polynomial p; dictionary that grows over time: once the dictionary is 5 i n t l=0; created, if a buffer block is found inside the dictionary it 6 foreach (kin [0 : ρ ]) { 7 f l o a t currErr; is replaced by the corresponding index, otherwise, the new 8 Polynomial currP; block is inserted in the dictionary as a new entry [18]. 9 bool continueSearch = true; Adding new blocks guarantees lossless compression, but 10 i n t i=0; has the drawback of having too large dictionaries. This 11 i n t currL; 12 while (continueSearch && i < len ( s ) ) makes the technique suitable only for particular scenarios { (i.e. input streams composed by words/characters or when 13 urrErr, currP = approx(k, s. the values inside a block are quantized). getPrefix(i)); 14 i f (currErr <  ) { Another drawback of this techniques is the way in which 15 i += 1 ; the dictionary is constructed: elements are simply appended 16 currL = i ; to the dictionary to preserve the indexing of previous blocks. 17 } e l s e { For a simple implementation of the dictionary, the com- 18 continueSearch = false; 19 } plexity for each search is O(n) where n is the size of the 20 } dictionary. This complexity can be improved by using more 21 while (continueSearch && S. efficient data structures. isNotEmpty() ) { 22 s.append(S.read() ) ; 23 currErr, currP = approx(k, s); 3.2 Function Approximation (FA) 24 i f (currErr <  ) { 25 currL = len(s); The main idea behind function approximation is that a 26 } e l s e { time series can be represented as a function of time. Since 27 currL = len(s) − 1 ; finding a function that approximate the whole time series is 28 continueSearch = false; 29 } infeasible due to the the presence of new values that cannot 30 } be handled, the time series are divided into segments and 31 i f (currErr <  && currL > l ) { for each of them an approximating function is found. 32 p = currP ; Exploring all the possible functions f : T → X is not 33 l = currL ; 34 } feasible thus implementations consider only one family of 35 } functions and try to find the parameters that better approx- 36 i f (len(s) > 0) { imate each segment. This makes the compression lossy. 37 send ( p , l ) ; A point of strength is that it doesn’t depend on the data 38 s.removePrefix(l); 39 } e l s e { domain, so no training phase is required, since the regres- 40 throw”Error: exceeded e r r o r sion algorithm considers only single segments in isolation. threshold value”; 41 } 3.2.1 Piecewise Polynomial Approximation (PPA) 42 } 43 } This technique divides a time series into several segments of fixed or variable length and tries to find the best polyno- mials that approximate segments. This algorithm finds repeatedly the polynomial of degree Despite the compression is lossy, a maximum deviation between 0 and a fixed maximum that can approximate the from the original data can be fixed a priori to enforce a given longest segment within the threshold error, yielding the reconstruction accuracy. maximum local compression factor. After a prefix of the The implementation of this algorithm is described in stream has been selected and compressed into a polynomial, [6] where the authors apply a greedy approach and three the algorithm analyzes the following stream segment. different online regression algorithms for approximating constant functions, straight lines and . These online algorithms are: 3.2.2 Chebyshev Polynomial Transform (CPT) • the PMR-Midrange algorithm, that approximates us- ing constant functions [17]; Another implementation of polynomial compression can be • the optimal approximation algorithm, described in found in [11]. In this article the authors show how a time [5], that uses linear regression; series can be compressed in a sequence of finite Chebyshev • the randomized algorithm presented in [29], that polynomia. approximates using polynomials. The principle is very similar to the one shown in subsec- The algorithm used in [6] for approximating a time series tion 3.2.1 but based on the use of a different type of poly- segment is explained in Listing 3.2.1. This algorithm takes nomial. Chebyshev Polynomials are of two types, Tn(x), 6

Un(x), defined as [19]: takes as input time series elements, which are combined

n/2 with hidden states. Each hidden state is then computed n X (n − k − 1)! starting form the new input and the previous state. The T (x) = (−1)k (2x)n−2k, |x|< 1 (10) n 2 k!(n − 2k)! last hidden state of the encoder is passed as first hidden k=0 n/2 state for the decoder, which apply the same mechanism, X k (n − k)! n−2k with the only difference that each hidden state provides Un(x) = (−1) (2x) , |x|< 1 (11) k!(n − 2k)! also an output. The output provided by each state is the k=0 reconstruction of the relative time series element and is where n ≥ 0 is the polynomia degree. passed to the next state.

3.2.3 Discrete Wavelet Transform (DWT) Discrete wavelet transform uses wavelets functions to trans- form time series. Wavelets are functions that, similarly to a wave, starts from zero, and ends with zero after some oscillations. An application of this technique can be found in [1]. This transformation can be written as: 1 t − nb Ψm,n(t) = √ Ψ (12) am am where a > 1, b > 0, m, n ∈ Z. To recover one transformed signal, the following formula can be applied: X X x(t) = hx, Ψm,ni · Ψm,n(t) (13) m∈Z n∈Z

3.3 Autoencoders An autoencoder is a particular neural network that is trained to give as output the same values passed as input. Its architecture is composed by two symmetric parts: encoder and decoder. Giving an input of dimension n, the encoder Fig. 4: General structure of a RNN encoder and decoder gives as output a vector with dimensionality m < n, called code, while the decoder reconstructs the original input from New hidden state ht is obtained by applying: the code, as shown in Figure 3 [9]. ht = φ(W xt + Uht−1) (14) where φ is a logistic sigmoid function or the hyperbolic tangent. One application of this technique is shown in [13], in which also Long Short-Term Memory [12] is consid- ered. This implementation compresses time series segments of different length using autoencoders. The compression achieved is lossy and a maximal loss threshold  can be enforced. The training set is preprocessed considering temporal variations od data applying: X ∆(L) = |xt − xt−1| (15) t∈L where L is the local time window. The value obtained by Equation 15 is then used to partition the time series, such that each segment has a total variation close to a predetermined value τ. Fig. 3: Simple autoencoder architecture Listing 3 shows an implementation of the RNN autoen- coder approach, with error threshold .

3.3.1 Recurrent Neural Network Autoencoder (RNNA) RNNA compression algorithms exploit recurrent neural net- work [30] to achieve a compressed representation of a time series. Figure 4 shows the general unrolled structure of a recurrent neural network encoder and decoder. The encoder 7

Listing 3: RNN compression algorithm Listing 4: Huffman Code compression 1 compress (StreamS, float  , RAE a) { 1 createDictionary(StreamPrefixS) { 2 Segment s; 2 TreeT= new Tree(); 3 while (S.isNotEmpty()) { 3 PriorityList L = createPriorityList(S); 4 Segment aux = s; 4 foreach (i in [0 : L.length − 1 ] ) { 5 Element e = S.read(); 5 Noden= new Node(); 6 aux.append(e) ; 6 Element el1 = L.extractMin(); 7 i f (getError(aux, a.decode(a.encode(aux) ) <  ) { 7 Element el2 = L.extractMin(); 8 n.frequency = el1.frequency + el2. 8 s = aux frequency ; 9 } e l s e { 9 T.addTree(n, (el1, el2)); 10 send(a.encode(s)) ; 10 L . add ( n ) ; 11 s . empty ( ) ; 11 } 12 s.append(e) ; 12 return L.toDictionary(); 13 } 13 } 14 } 14 15 } 15 compress (StreamS, int prefixLen) { 16 StreamPrefix s = S.prefix(prefixLen); where: 17 Dictionary D = createDictionary(s); 18 CompressedRepresentationR= []; • RAE a is the recurrent autoencoder trained on a 19 while (S.isNotEmpty()) { training set, composed of an encoder and a decoder; 20 Element e = S.read(); 21 send(D[e]) ; • getError computes the reconstruction error be- 22 } tween the original and reconstructed segment; 23 } •  is the error threshold value.

3.4 Sequential algorithms Run-length In this technique, each run (sequence in This architecture is characterized by combining sequentially which the same value is repeated consecutively) is substi- several simple compression techniques. Some of the most tuted with the pair (vt, o) where vt is the value at time t and used techniques are: o is the number of consecutive occurrences [10]. Fibonacci binary encoding This encoding technique is • Huffman coding; based on the Fibonacci sequence, defined as: • Delta encoding; • Run-length encoding; ( F (N − 1) + F (N − 2), if N >= 1 • Fibonacci binary encoding. F (N) = (16) N, N = -1, N = 0 These techniques, summarized below, are the building blocks of the methods presented in the following subsec- Having F (N) = a1 . . . ap where aj is the bit at position tions. j of the binary representation of F (N), the Fibonacci binary Huffman coding Huffman coding is the basis of many coding is defined as: compression techniques since it is one of the necessary steps, as for the algorithm shown in Subsection 3.4.4. j X The encoder creates a dictionary that associates each FE(N) = ai · F (i) (17) symbol to a binary representation and replace each symbol i=0 of the original data with the corresponding representa- a ∈ {0, 1} tion. Compression algorithm is shown in Listing 4 [28]. Where i is the i-th bit of the binary representation F (N) where createPriorityList function creates a list of of [37]. elements ordered from the less frequent to the most fre- quent, addTree function adds to the tree a father node 3.4.1 Delta encoding, Run-length and Huffman (DRH) with its children and createDictionary assigns 0 and 1 respectively to left and right arcs and creates the dic- This technique combines together three well known com- tionary assigning to characters the sequence of 0|1 in the pression techniques [23]: delta encoding, run-length en- route from the root to leaf that represent the characters. coding and Huffman code. Since these techniques are all Since the priority list is sorted according to frequency, more lossless, the compression provided by DRH is lossless too in frequent characters are inserted closer to the root and they case no quantization is applied. are represented using a shorter code. The following pseudocode describes the compression The decoder algorithm is very simple since the decoding algorithm: Where Q is the quantization level. process is the inverse of the encoding process: encoded The decompression algorithm is the inverse of the com- bits are searched in the dictionary and replaced with the pression one: once data is received, it is decoded using corresponding original symbols. the Huffman code and reconstructed using the repetitions Delta encoding This techniques encodes a target file counter. with respect to one or more reference files [36]. In the Since this kind of algorithm is not computationally ex- particular case of time series, each element at time t is pensive, the compression phase can be performed also by encoded as ∆(xt, xt−1). low resources computational units, such as sensor nodes. 8

DRH nodel level

1 compress (StreamS, floatQ) { 2 Element lastValue = null; 3 Element lastDelta = null; 4 i n t counter = 0; 5 foreach (s in S) { 6 i f (last != null && lastDelta != null) { 7 f l o a t delta = (int)(lastValue − s ) / Q; 8 lastValue = s; 9 i f (delta == lastDelta) { 10 counter += 1; 11 } e l s e { 12 encoded = huffmanEncode(lastDelta) ; Fig. 5: RLBE compression steps 13 send((encoded, counter)); 14 lastDelta = delta; 15 counter = 0 ; Preprocessing In this phase, a dictionary is used to 16 } 17 } e l s e { transform original data. For this purpose, many algorithms 18 lastValue = s; can be used, such the Huffman coding presented in Sub- 19 lastDelta = 0; section 3.4, but since the aim is that of obtain sparsity, 20 } the RAKE dictionary uses a code similar to unary coding 21 } 22 } thus every codeword has at most one bit set to 1. This dictionary doesn’t depend on symbol probabilities, so no learning phase is needed. Table 1 shows a simple RAKE dictionary. 3.4.2 Sprintz Sprintz algorithm [3] is designed for the IoT scenario, in TABLE 1: RAKE dictionary which energy consumption and speed are important factors. In particular, the goal is to satisfy the following require- RAKE symbol code length ments: -1 1 1 +1 01 2 • Handling of small blocks size -2 001 3 • High decompression speed +2 0001 4 • Lossless data reconstruction ...... -R 0...1 2R - 1 The proposed algorithm is a coder, that exploits prediction +R 00...1 2R to achieve better results. In particular it is based on the 0 all zeros 2R following components: Compression This phase works, as suggested by the • Forecasting: used to predict the difference between algorithm name, as a rake of n teeth. Figure 6 shows an new samples and the previous ones through delta execution of the compression phase, given a preprocessed encoding or FIRE algorithm [3]; input and n = 4. • Bit packing: packages are composed by a payload that contains prediction errors and a header that contains information that are used during recon- struction; • Run-length encoding: if a sequence of correct fore- casts occurs, the algorithm doesn’t send anything until some error is detected and the length of skipped zero error packages is added as information; • Entropy coding: package headers and payloads are coded using Huffman coding, presented in Subsec- tion 3.4.

3.4.3 Run-Length Binary Encoding (RLBE) This lossless technique is composed by 5 steps, combining Fig. 6: RAKE algorithm execution delta encoding, run-lenght and Fibonacci coding, as shown in Figure 5 [33]. The rake starts at position 0, in case there is no bit set to 1 This techniques is developed specifically for devices in the rake interval, then 0 is added to the code, otherwise 1 characterized by low memory and computational resources, and followed by the binary representation of relative index such as IoT infrastructures. for the first bit set to 1 in the rake (2 bits in the example in Figure 6). After that, the rake is shifted right of n positions 3.4.4 RAKE for the first case or starting right to the first found bit. In the RAKE algorithm, presented in [4], exploits sparsity to figure the rake is initially at position 0, and the first bit set achieve compression. It is a lossless compression algorithm to 1 is in relative position 1 (output: 1 followed by 01), then with two phases: preprocessing and compression. the rake advance of 2 positions (after the first 1 in the rake); 9

all bits are set to zero (output: 0) thus that rake is moved forward by 4 places; the first bit set to 1 in the rake has relative index 2 (output: 1 followed by 10) thus the rake is advances by 3 places and the process continues for the two last rake positions (output: 101 and 0 respectively). Decompression The decompression processes the com- pressed bit stream replacing 0 with n occurrences of 0 and 1 followed by an offset with a number 0 equal to the offset followed by a bit set to 1. The resulting stream is decoded on the fly using the dictionary.

3.5 Others In this subsection we present other time series compres- Fig. 7: Example of segment merging technique, with increas- sion algorithms that cannot be grouped in the previously ing increasing value time interval described categories.

3.5.1 Major Extrema Extractor (MEE) an instance of the case of segments, and it is immediate to This algorithm is introduced in [7] and exploits time series generalize from two to more elements/segments. We limit features (maxima and minima) to achieve compression. For our description to the merge of two consecutive segments this purpose, strict, left, right and flat extrema are defined. represented by the tuples (ti, yi, δi) and (tj, yj, δj), where Considering a time series TS = [(t0, x0),..., (tn, xn)], xi is i < j, into a new segment represented by the tuple (t, y, δ) a minimum if follows these rules: computed as: • strict: if xi < xi−1 ∧ xi < xi+1 t = ti • left: if xi < xi−1 ∧ ∃k > i : ∀j ∈ [i, k], xj = xi ∨ xi < ∆ti · yi + ∆tj · yj xk+1 y = ∆t + ∆t • right: if xi < xi+1∧∃k < i : ∀j ∈ [k, i], xj = xi∨xi < i j s xk−1 2 2 2 2 ∆ti · (yi + δi ) + ∆tj · (yi + δi ) 2 • flat: if (∃k > i : ∀j ∈ [i, k], xj = xi ∨ xi < xk+1) ∧ δ = − y ∆ti + ∆tj (∃k < i : ∀j ∈ [k, i], xj = xi ∨ xi < xk−1) where ∆t is the duration of the segment x, thus ∆t = For maximum points they are defined similarly. x i t − t , ∆t = t − t and t is the timestamp of the segment After defining minimum and maximum extrema, the j i j k j k after the one starting at t . authors introduce the concept of importance, based on a j The sets of consecutive segments to be merged are cho- distance function dist and a compression ratio ρ: x is an i sen with the goal of minimizing segment error with con- important minimum if: straints on maximal acceptable error and maximal segment ∃il < i < ir : xi is minimum in {xl, . . . , xr} ∧ duration. This technique compression technique is lossy and the dist(xl, xi) < ρ ∧ dist(xi, xr) < r result of the compression phase can be considered both as Important maximum points are defined similarly. the compressed representation and as the reconstruction of Once important extrema are found they are used as a original time series, without any additional computations compressed representation of the segment. This technique executed by a decoder. is a lossy compression technique, as the Segment Merging one described in the next section. Despite the fact that 3.5.3 Continuous Hidden Markov Chain (CHMC) the compressed data can be used to obtain original data The idea behind this algorithm is that the data generation properties useful for visual data representations (minimum process follows a probabilistic model and can be described and maximum), it is impossible to reconstruct the original with a Markov chain [14]. This means that a system can be data. represented with a set of finite states S and a set of arcs A for transition probabilities between states. 3.5.2 Segment Merging (SM) Once the hidden Markov chain is found using known This technique, presented in [8], considers time series with techniques, as the one presented in [34], a lossy reconstruc- regular timestamps and repeatedly replace sequences of tion of original data can be obtained by following the chain consecutive elements (segments) with a summary consisting probabilities. of a single value and a representation error, as shown in Figure 7 where the error is omitted. After compression, segments are represented by tuples 3.6 Algorithms summary (t, y, δ) where t is the starting time of the segment, y the When we apply the taxonomy described in Section 2.2 and constant value associated with the segment, and δ is the seg- Section 3 to the above techniques we obtain the classification ment error. The merging operation can be applied either to reported in Table 2 that summarizes the properties of the a set of elements or to a set of segments, to further compress different implementations. To help the reader in visually a previously compressed time series. The case of elements is grasping the membership of the techniques to different 10 parts of the taxonomy, we report the same classification • AMPDs7 – Measuring maximum consumption of graphically in Figure 1 using Venn diagrams. water, electricity, and natural gas in houses.

TABLE 2: Compression algorithm classification 4.1 Algorithms

Dictionary based techniques To foster the comparison of the different algorithms we Non adaptive Lossless Symmetric min ρ max  report synoptically the accuracy and compression ratio ob- TRISTAN - - X X X tained in the experiments described by their authors. CONRAD - - X - X A-LZSS - X X - NA 4.1.1 TRISTAN D-LZW - X X X NA Function Approximation Accuracy The creation of the dictionary depends on the Non adaptive Lossless Symmetric min ρ max  dictionary sparsity parameter that is directly related to its PPA X - - - X CPT X - - - X size and the accuracy can be influenced by this value, as DWT X - - - X shown in Equation 5. Experimental results are shown in Autoencoders Figure 8, where sparsity is related with RMSE. Non adaptive Lossless Symmetric min ρ max  RNNA - - X - X Sequential algorithms Non adaptive Lossless Symmetric min ρ max  DRH - X X - NA SPRINTZ X X X - NA RLBE X X X - NA RAKE X X X - NA Others Non adaptive Lossless Symmetric min ρ max  MEE X - - X - SM X - - X - CHMC - - X - -

4 EXPERIMENTAL RESULTS In this section we discuss the different performances of the techniques in Section 3, as reported by their authors. To ensure an homogeneous presentation of the different tech- niques and for copyright reasons, we do not include figures from the works we are describing. Instead, we redraw the Fig. 8: RMSE results depending on sparsity for TRISTAN figures using the same graphical style for all of them and [22] maintaining a faithful reproductions with respect to the represented values. Compression ratio The sparsity parameter also affects the The experiments presented in those studies where based compression ratio. To assess this dependency, the authors on the the following datasets: execute experiments on two different datasets both contain • ICDM challenge [38] – Traffic flows; one measurements per minute with daily segments: RTE • RTE France 1 – Electrical power consumptions; and ICDM, using dictionaries consisting respectively of 16 • ACSF12 – Power consumptions; and 131 atoms. ρ = 0.5 • BAFU3, – Hydrogeological data; The resulting compression ratios are for RTE ρ = 0.05 • GSATM4 – Dynamic mixtures of carbon monoxide and for ICDM, thus higher sparsity values (larger (CO) and humid synthetic air in a gas chamber; dictionaries) allow for better compression ratios. 5 • PigAirwayPressure – vital signs; 4.1.2 CONRAD • SonyAIBORobotSurface25 – X-axis of robot move- Accuracy The authors of CONRAD measure accuracy using ments; MSE and the best reported result for their algorithms are • SMN [32] – Seismic measures in Nevada; 0.03 for the BAFU dataset, 0.11 for the GSATM dataset and • HAS [31] – Human activity from smartphone sen- 0.04 for the ACSF1 dataset. As for TRISTAN, accuracy de- sors; pends on time series segments redundancy and correlation. • PAMAP [25] – Wearable devices motion and heart Compression ratio The best reported results for CONRAD frequency measurements; compression ratio are ρ = 0.04 for the ACSF1 dataset, ρ = • MSRC-126 – Microsoft Kinect gestures; 0.07 for the BAFU dataset and ρ = 0.06 for the GSATM • UCI gas dataset4 – Measuring gas concentration dur- dataset. ing chemical experiments; The compression ratio is affected by the parameters 1. https://data.rte-france.com used during training and encoding: error threshold, sparsity 2. http://www.timeseriesclassification.com and segment length. Their effects are depicted in the three 1 3. https://www.bafu.admin.ch plots in Figure 9, that represent the compression factor ( ρ ) 4. https://archive.ics.uci.edu obtained for different values of the parameters. 5. https://www.cs.ucr.edu/ 6. https://www.microsoft.com 7. http://ampds.org 11 4.1.3 Piecewise Polynomial Approximation (PPA) Accuracy and compression ration of PPA are affected by the maximum polynomial degree and the maximum allowed deviation, the parameters of the method as described in Subsection 3.2. Accuracy Errors are bound by the maximum deviation parameter and also the maximum polynomial degree can affect accuracy: deviation is always under the threshold and using higher degree polynomial yield more precise reconstructions. Figure 10 shows the relation between maximum devia- tion and MSE.

(a) Varying error threshold

Fig. 10: Relation between MSE and maximum deviation for PPA [13]

Compression ratio Both the maximum deviation and poly- nomial degree parameters affect the compression ratio, as we can observe in Figure 11 where the compression factor 1 ( ρ ) is reported for different values of parameters. The best (b) Varying dictionary atoms compression ratios, ρ = 0.06 for REDD and ρ = 0.02 for Office, are achieved for the highest degree polynomials. As expected, also for increasing maximum deviation the compression factor is monotonically increasing.

4.1.4 Chebyshev Polynomial Transform (CPT) Despite the absence of experimental results in the original paper, some considerations can be done. The algorithm has three parameters:

• Block size; • Maximum deviation; • Number of quantization bits. Similarly to PPA, in CPT it is possible to obtain higher compression ratios at the cost of higher MSE by increasing the maximum deviation parameter. The same holds for the block size: larger blocks corresponds to better compression ratios and higher decompression errors [11]. Similarly, we can suppose that using less quantization bits would entail a (c) Varying segment length less accurate decompression and a better compression ratio. 1 Fig. 9: Compression factor ( ρ ) with varying parameters for 4.1.5 Discrete Wavelet Transform (DWT) CONRAD [16] As for CPT, experimental results are not provided, but also DWT performances follow the same rules as for PPA, 12

time series: ρ is in the range [0.01, 0.08] for the univariate time series dataset and in the range [0.31, 0.05] for the multivariate one.

4.1.7 Delta encoding, Run-Length and Huffman (DRH) DRH, as most of the following algorithms, is a lossless algorithm thus the only relevant performance metric is the compression ratio. Compression ratio This parameter highly depends on the dataset: run-length algorithm combined with delta encoding achieves good results if the time series are composed by long segments that are always increasing or decreasing of the same value or constant. Experimental results are obtained on temperature, pressure and light measures datasets. For this type of data, DRH appear to be appropriate, since values are not highly variable and have relatively long (a) Varying maximum polynomial degree interval with constant values. The resulting compression factors are reported in Table 3 [23].

TABLE 3: DRH Compression factors

1 /ρ Indoor temperature 92.2 Outdoor temperature 91.17 Pressure 82.55 Light 83.53

4.1.8 Sprintz Compression ratio This algorithm is tested over the datasets of the UCR time series archive, which contains data com- ing from different domains. In the boxplots in Figure 12, compression factors are shown for 16 and 8-bit data and compared with other lossless compression algorithms. From this graph it is possible to see that all algorithms achieve better (higher) compression factors on the 8-bit dataset. This (b) Varying maximum deviation could be due to the fact that 16-bit data are more likely to have higher differences between consecutive values. Another interesting result is that the FIRE forecasting 1 Fig. 11: PPA [6] compression factor ( ρ ) for varying parame- technique improves this compression, especially if com- ters bined to Huffman coding, when compared with Delta cod- ing [3].

since compression ratio and threshold errors are strictly 4.1.9 Run-Length Binary Encoding (RLBE) correlated. The threshold error can be fixed a priori in order to have an higher compression ratio or a higher accuracy. Compression ratio The author of this algorithm in [33] report compression ratio and processing time of RLBE with 4.1.6 Recurrent Neural Network Autoencoder (RNNA) respect to those of other algorithms as shown in Figure 13 Accuracy One of the parameters of the RNNA based method where ellipses represent the inter-quartile range of com- is a threshold for the maximal allowed deviation whose pression ratios and processing times. RLBE achieves good value directly affects accuracy. For the experiments on uni- compression ratios with a 10% variability and with low variate time series presented in [13] the authors report, processing time. considering the optimal value for deviation parameter, a RMSE value in the range [0.02, 0.07] for different datasets. 4.1.10 RAKE In the same experiments, RMSE is in the range [0.02, 0.05] Compression ratio The authors of RAKE in [4] run experi- for multivariate time series datasets. Thus, there is no signif- ments on sparse time series with different sparsity ratios p icant difference between univariante and multivariate time and different algorithms. The corresponding compression series. factors are reported in Table 4. We can notice that the Compression ratio Similarly, the maximal allowed devia- compression factor is highly dependent on sparsity and tion affect the compression ratio. According to the results lower sparsity values corresponds to higher compression reported by the authors, also in this case there is not a factors (better compression). Thus the RAKE algorithm is significant difference between univariate and multivariate more suitable for sparse time series. 13 TABLE 4: Compression factor results for RAKE versus other lossless compression algorithms

p Algorithm 0.002 0.005 0.01 0.05 0.1 0.15 0.2 0.25 OPT 48.0 22.0 12.4 3.5 2.1 1.6 1.4 1.2 RAKE 47.4 21.5 12.0 3.5 2.1 1.6 1.4 1.2 RAR 22.1 12.1 7.4 2.6 1.7 1.4 1.2 1.2 GZIP 21.4 11.8 7.4 2.6 1.8 1.4 1.3 1.2 BZIP2 26.0 14.2 8.7 2.6 1.7 1.3 1.2 1.1 PE 29.5 11.9 5.9 1.2 0.6 0.4 0.3 0.2 RLE 35.7 15.7 8.4 2.1 1.2 0.9 0.8 0.6

information about accuracy. Since the accuracy depends on the compression ratio, the authors recommend to choose a value that is not smaller than the percentage of non-extremal points, in order to obtain an accurate reconstruction of the compressed data [7].

4.1.12 Segment Merging (SM) Since this algorithm is used for visualization purposes, a first evaluation can be a qualitative observation. In Fig- ure 7 we can see how a light sensor time series is com- pressed considering different time series segment lengths. For this algorithm, compression ratio and error are strictly correlated, as shown in Figure 14 where FWC and CB- m corresponds to different versions of the algorithm and different parameters [8]. We can observe that compression

Fig. 12: Sprintz compression factors versus other lossless compression algorithms [3]

Fig. 14: Correlation between compression ratio and error on a temperature dataset [8]

ratio values close to 1 correspond to error values close to zero. The dependence of compression ratio and error changes for different datasets. Knowing the function that correlates Fig. 13: RLBE compression ratio versus other lossless com- compression ratio and error for a given dataset, it is possible pression algorithms [33] to set the compression ratio that corresponds to the desired maximum compression error.

4.1.11 Major Extrema Extractor (MEE) 4.1.13 Continuous Hidden Markov Chain (CHMC) The compression ratio is one of the parameter of the MEE For this algorithm, the authors haven’t provided any quan- algorithm thus the only relevant performance metric is the titative evaluation for compression performances: their as- accuracy. sessment is qualitative and mainly focused on the compres- Accuracy Despite the fact that MEE is a lossy com- sion and reconstruction of humanoid robot motions after a pression algorithm, the authors have not provided any learning phase [14]. 14

5 CONCLUSIONS [11] Hawkins, S.E.I., Darlington, E.H.: Algorithm for Compressing Time-Series Data. NASA Tech Briefs (2012). URL https:// The amount of time series data produced in several contexts ntrs.nasa.gov/search.jsp?R=20120010460 requires specialized time series compression algorithms to [12] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural store them efficiently. In this paper we provide an analysis computation 9, 1735–80 (1997). DOI 10.1162/neco.1997.9.8.1735 of the most relevant ones, we propose a taxonomy to classify [13] Hsu, D.: Time Series Compression Based on Adaptive Piecewise them, and report synoptically the accuracy and compression Recurrent Autoencoder. ArXiv: 1707.07961 [14] Inamura, T., Tanie, H., Nakamura, Y.: Keyframe compression and ratio published by their authors for different datasets. Even decompression for time series data based on the continuous hid- if this is a meta-analysis, it is informative for the reader den Markov model. In: Proceedings 2003 IEEE/RSJ International about the suitability of a technique for specific contexts, Conference on Intelligent Robots and Systems (IROS 2003) (Cat. No.03CH37453), vol. 2, pp. 1487–1492. IEEE, Las Vegas, NV, USA for example when fixed compression ratio or accuracy are (2003). DOI 10.1109/IROS.2003.1248854. required, or for dataset having particular characteristics. [15] Jensen, S.K., Pedersen, T.B., Thomsen, C.: Time Series Man- In particular, Dictionary-Based methods are more effec- agement Systems: A Survey. IEEE Transactions on Knowledge tive with datasets having high redundancy and sparsity and and Data Engineering 29(11), 2581–2600 (2017). DOI 10.1109/ TKDE.2017.2740932. can be divided into lossy and lossless. The latter ensure [16] Khelifati, A., Khayati, M., Cudre-Mauroux, P.: Corad: Correlation- a faithful reconstruction of the time series, whereas the aware compression of massive time series using sparse dictionary first obtain better compression ratio and allow for choosing coding. In: 2019 IEEE International Conference on Big Data (Big a tradeoff between accuracy and compression. Functional Data), pp. 2289–2298. IEEE Computer Society, Los Alamitos, CA, USA (2019). DOI 10.1109/BigData47090.2019.9005580. approximation methods are preferable for smooth time [17] Lazaridis, I., Mehrotra, S.: Capturing sensor-generated time series series, such as those containing environmental conditions with quality guarantees. In: Proceedings 19th International Confer- measures for which it can achieve low compression ratios ence on Data Engineering (Cat. No.03CH37405), pp. 429–440 (2003). with high accuracy. Autoencoders accuracy and compres- DOI 10.1109/ICDE.2003.1260811 [18] Le, T.L., Vo, M.: Lossless data compression algorithm to save sion ratio are strongly related to the capacity of the RNN energy in wireless sensor network. In: 2018 4th International of finding pattern in the training set. One limitation of Conference on Green Technology and Sustainable Development this technique is that in case a compressor receive in input (GTSD), pp. 597–600 (2018). DOI 10.1109/GTSD.2018.8595614 time series that are not similar to the ones in the training [19] Lv, X., Shen, S.: On Chebyshev polynomials and their applications. Advances in Difference Equations 2017(1), 343 (2017). DOI 10.1186/ set, the compression ratio can be worse than expected. s13662-017-1387-8. The Sequential algorithms described in this survey are all [20] Mairal, J., Bach, F., Ponce, J., Sapiro, G.: Online learning for matrix lossless and computationally efficient, so they work well in factorization and sparse coding. Journal of Machine Learning the IoT context, in which devices with low computational Research 11, 19–60 (2009). DOI 10.1145/1756006.1756008 [21] Mallat, S., Zhifeng Zhang: Matching pursuits with time-frequency capabilities are often be used. Finally, the last algorithms we dictionaries. IEEE Transactions on Signal Processing 41(12), 3397– considered (SM and MEE) are mostly used for visualization 3415 (1993). DOI 10.1109/78.258082 purposes. [22] Marascu, A., Pompey, P., Bouillet, E., Wurst, M., Verscheure, O., Grund, M., Cudre-Mauroux, P.: TRISTAN: Real-time analyt- ics on massive time series using sparse dictionary compression. REFERENCES In: 2014 IEEE International Conference on Big Data (Big Data), pp. 291–300. IEEE, Washington, DC, USA (2014). DOI 10.1109/ [1] Abo-Zahhad, M.: ECG Signal Compression Using Discrete Wavelet BigData.2014.7004244. Transform. In: J.T. Olkkonen (ed.) Discrete Wavelet Transforms - [23] Mogahed, H.S., Yakunin, A.G.: Development of a Lossless Data Theory and Applications. InTech (2011). DOI 10.5772/16019. Compression Algorithm for Multichannel Environmental Moni- [2] Asghari, P., Rahmani, A.M., Javadi, H.H.S.: Internet of Things toring Systems. In: 2018 XIV International Scientific-Technical applications: A systematic review. Computer Networks 148, 241– Conference on Actual Problems of Electronics Instrument Engi- 261 (2019). DOI 10.1016/j.comnet.2018.12.008. neering (APEIE), pp. 483–486. IEEE, Novosibirsk (2018). DOI [3] Blalock, D., Madden, S., Guttag, J.: Sprintz: Time Series Com- 10.1109/APEIE.2018.8546121. pression for the Internet of Things. Proceedings of the ACM on [24] Pope, J., Vafeas, A., Elsts, A., Oikonomou, G., Piechocki, R., Interactive, Mobile, Wearable and Ubiquitous Technologies 2(3), 1– Craddock, I.: An accelerometer lossless compression algorithm and . 23 (2018). DOI 10 1145/3264903. energy analysis for IoT devices. In: 2018 IEEE Wireless Communica- [4] Campobello, G., Segreto, A., Zanafi, S., Serrano, S.: RAKE: A tions and Networking Conference Workshops (WCNCW), pp. 396– simple and efficient lossless compression algorithm for the Internet 401. IEEE, Barcelona (2018). DOI 10.1109/WCNCW.2018.8368985. of Things. In: 2017 25th European Signal Processing Confer- [25] Reiss, A., Stricker, D.: Towards global aerobic activity monitoring. ence (EUSIPCO), pp. 2581–2585. IEEE, Kos, Greece (2017). DOI In: Proceedings of the 4th International Conference on PErvasive 10.23919/EUSIPCO.2017.8081677. Technologies Related to Assistive Environments, PETRA ’11. As- [5] Dalai, M., Leonardi, R.: Approximations of one-dimensional digital sociation for Computing Machinery, New York, NY, USA (2011). signals under thelinftynorm. IEEE Transactions on Signal Process- DOI 10.1145/2141622.2141637. ing 54(8), 3111–3124 (2006). DOI 10.1109/TSP.2006.875394 [6] Eichinger, F., Efros, P., Karnouskos, S., Bohm,¨ K.: A time-series com- [26] Ronkainen, J., Iivari, A.: Designing a Data Management Pipeline pression technique and its application to the smart grid. The VLDB for Pervasive Sensor Communication Systems. Procedia Computer . . . . . Journal 24(2), 193–218 (2015). DOI 10.1007/s00778-014-0368-8. Science 56, 183–188 (2015). DOI 10 1016/j procs 2015 07 193. [7] Fink, E., Gandhi, H.S.: Compression of time series by extracting ma- [27] Salomon, D.: Data compression: the complete reference, 4th ed jor extrema. Journal of Experimental & Theoretical Artificial Intel- edn. Springer, London (2007) ligence 23(2), 255–270 (2011). DOI 10.1080/0952813X.2010.505800. [28] Sayood, K.: Introduction to data compression, 3rd ed edn. Morgan [8] Goldstein, R., Glueck, M., Khan, A.: Real-time compression of time Kaufmann series in multimedia information and systems. Elsevier, series building performance data. In: Proceedings of IBPSA-AIRAH Amsterdam ; Boston (2006) Building Simulation Conference (2011) [29] Seidel, R.: Small-dimensional linear programming and convex [9] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press hulls made easy. Discrete & Computational Geometry 6(3), 423– (2016). http://www.deeplearningbook.org 434 (1991). DOI 10.1007/BF02574699. [10] Hardi, S.M., Angga, B., Lydia, M.S., Jaya, I., Tarigan, J.T.: Compara- [30] Sherstinsky, A.: Fundamentals of Recurrent Neural Network tive Analysis Run-Length Encoding Algorithm and Fibonacci Code (RNN) and Long Short-Term Memory (LSTM) network. Phys- Algorithm on Image Compression. Journal of Physics: Conference ica D: Nonlinear Phenomena 404, 132306 (2020). DOI 10.1016/ Series 1235, 012107 (2019). DOI 10.1088/1742-6596/1235/1/012107. j.physd.2019.132306. 15

[31] Shoaib, M., Bosch, S., Incel, O., Scholten, H., Havinga, P.: Fu- Giacomo Chiarot received his M.S. degree in sion of smartphone motion sensors for physical activity recogni- Computer Science from Ca’ Foscari University of tion. Sensors (Basel, Switzerland) 14, 10146–10176 (2014). DOI Venice in 2019. Currently, he is a full-time Ph.D. 10.3390/s140610146 Candidate in Computer Science at Ca’ Foscari [32] Smith, K., Depolo, D., Torrisi, J., Edwards, N., Biasi, G., Slater, D.: University of Venice. His current research in- The 2008 mw 6.0 wells, nevada earthquake sequence. AGU Fall clude time series, compression algorithms, and Meeting Abstracts (2008) machine learning. [33] Spiegel, J., Wira, P., Hermann, G.: A Comparative Experimental Study of Lossless Compression Algorithms for Enhancing Energy Efficiency in Smart Meters. In: 2018 IEEE 16th International Con- ference on Industrial Informatics (INDIN), pp. 447–452. IEEE, Porto (2018). DOI 10.1109/INDIN.2018.8471921. [34] Steve, Y., Gunnar, E., Mark, G., Thomas, H., Dan, K., Xunying An- drew, L., Gareth, M., Julian, O., Dave, O., Valtcho, V., Phil, W.: The HTK Book. Microsoft (2006) [35] Storer, J.A., Szymanski, T.G.: Data compression via textual substi- tution. Journal of the ACM (JACM) 29(4), 928–951 (1982). DOI 10.1145/322344.322346. [36] Suel, T.: Delta Compression Techniques. In: S. Sakr, A. Zomaya (eds.) Encyclopedia of Big Data Technologies, pp. 1–8. Springer International Publishing, Cham (2018). DOI 10.1007/978-3-319- Claudio Silvestri received his M.S. degree in 63962-8 63-1. Computer Science in 2002 and the Ph.D. de- [37] Walder, J., Kratk´ y,´ M., Platos, J.: Fast fibonacci encoding algorithm. gree in Computer Science in 2006. He has been In: J. Pokorny,´ V. Snasel,´ K. Richta (eds.) Proceedings of the Dateso working as a Research Fellow in the Department 2010 Annual International Workshop on DAtabases, TExts, Specifi- of Computer Science at Universita` degli Studi di cations and Objects, Stedronin-Plazy, Czech Republic, April 21-23, Milano and in the Institute for Information Sci- 2010, CEUR Workshop Proceedings, vol. 567, pp. 72–83. CEUR-WS.org ence and Technologies of the Italian National (2010). URL http://ceur-ws.org/Vol-567/paper14.pdf Research Council in Pisa. Currently, he is an [38] Wojnarski, M., Gora, P., Szczuka, M., Nguyen, H.S., Swietlicka, J., Assistant Professor in the Department of Envi- Zeinalipour, D.: Ieee icdm 2010 contest: Tomtom traffic prediction ronmental Sciences, Informatics and Statistics for intelligent gps navigation. In: 2010 IEEE International Con- at Ca’ Foscari University of Venice. His research ference on Data Mining Workshops, pp. 1372–1376 (2010). DOI interests are in the areas of databases, distributed and high performance 10.1109/ICDMW.2010.51 computing, spatio-temporal data processing, privacy in mobile comput- [39] Wolff, J.: Wolff - 1990 - Simplicity and Power - Some Unifying ing. He published over 50 papers on peer reviewed international journals Ideas in Comp.pdf. CoRR cs.AI/0307013 (2003) and conferences.