<<

applied sciences

Article An Efficient Method for Periodic Based on Adaptive Dictionary Predictive Coding

Shaofei Dai 1,2,*, Wenbo Liu 1,2, Zhengyi Wang 1,2, Kaiyu Li 1,2, Pengfei Zhu 1,2 and Ping Wang 1,2 1 School of Automation, Nanjing University of Aeronautics and Astronautics, Nanjing 211006, China; [email protected] (W.L.); [email protected] (Z.W.); [email protected] (K.L.); [email protected] (P.Z.); [email protected] (P.W.) 2 Key Laboratory of Ministry of Industry and Technology of Non-Destructive Testing and Monitoring Technology of High-Speed Carrying Facilities, Nanjing 211006, China * Correspondence: [email protected]

 Received: 20 June 2020; Accepted: 14 July 2020; Published: 17 July 2020 

Abstract: This paper reports on an efficient lossless compression method for periodic signals based on adaptive dictionary predictive coding. Some previous methods for , such as difference pulse coding (DPCM), discrete cosine transform (DCT), lifting transform (LWT) and KL transform (KLT), lack a suitable transformation method to make these data less redundant and better compressed. A new predictive coding approach, basing on the adaptive dictionary, is proposed to improve the compression ratio of the periodic . The main criterion of lossless compression is the compression ratio (CR). In order to verify the effectiveness of the adaptive dictionary predictive coding for periodic signal compression, different technologies, including DPCM, 2-D DCT, and 2-D LWT, are compared. The results obtained prove that the adaptive dictionary predictive coding can effectively improve data compression efficiency compared with traditional transform coding technology.

Keywords: lossless compression; periodic signals; adaptive dictionary; predictive coding; transform coding

1. Introduction With the popularization of digitalization, computers and data processing equipment have penetrated into all walks of life, and analog communication has almost been replaced by digital communication. People are faced with the rapid growth of mass information, and the increasing amount of data transmitted, processed and stored has caused great pressure on the transmission , storage capacity and processing speed. For example, the Fuji Electric’s PowerSataliteII measurement terminal recorded a power quality fault of 2 s in length, and the generated recording file was 948 kB. It can be seen from this that the huge amount of data takes up a lot of limited storage space and network resources. Therefore, researching efficient data compression methods that reduce the redundancy existing in massive data has become an urgent need in many modern industries. This includes power quality signal compression in the power industry and ECG signal compression in the medical industry, which are periodic signals or quasi-periodic signals. Data compression is the smallest digital representation of the signal sent by the source, reducing the signal space that contains a given message set or data sampling set. The general steps of data compression and decompression are shown in Figure1. The compression process mainly includes three steps: transform coding [1–3], quantization, and entropy coding [4–7] or dictionary compression [8,9]. The decompression process mainly includes three steps: entropy decoding or dictionary decompression, inverse quantization, and inverse transform.

Appl. Sci. 2020, 10, 4918; doi:10.3390/app10144918 www.mdpi.com/journal/applsci Appl. Sci. 2020, 10, 4918 2 of 14 Appl. Sci. 2020, 10, x FOR PEER REVIEW 2 of 15

FigureFigure 1. 1.Block Block diagram diagram of of data data compression compression and and decompression. decompression. An, Q proposed a data compression method for power quality monitoring based on two-dimensional An, Q proposed a data compression method for power quality monitoring based on two- discrete cosine transform (DCT), which converts one-dimensional data into two-dimensional data for dimensional discrete cosine transform (DCT), which converts one-dimensional data into two- data compression [10]. Zhang, R proposed a new three-phase power quality data compression method dimensional data for data compression [10]. Zhang, R proposed a new three-phase power quality based on , which, combined with Lempel-Ziv-Welch (LZW) coding, achieves a good data compression method based on wavelet transform, which, combined with Lempel-Ziv-Welch compression effect on power quality signal compression [11]. Tsai, T proposed a multi-channel lossless (LZW) coding, achieves a good compression effect on power quality signal compression [11]. Tsai, T ECG compression that uses exponentially weighted multi-channel linear prediction and adaptive proposed a multi-channel lossless ECG compression algorithm that uses exponentially weighted Golomb-Rice coding [12]. Alam, S proposed a DPCM-based threshold data compression technology multi-channel linear prediction and adaptive Golomb-Rice coding [12]. Alam, S proposed a DPCM- for real-time remote monitoring applications, which uses simple calculation and a high compression based threshold data compression technology for real-time remote monitoring applications, which ratio [13]. Liu, B demonstrated a unified GAN based signal compression framework for image and speech uses simple calculation and a high compression ratio [13]. Liu, B demonstrated a unified GAN based signals [14]. Wang, I proposed a framework of combining machine learning based classification and signal compression framework for image and speech signals [14]. Wang, I proposed a framework of pre-existing coding , along with a novel revamp of using DPCM for DWT coefficients [15]. combining machine learning based classification and pre-existing coding algorithms, along with a In order to improve the accuracy of prediction, Huang, F proposed a novel ECG signal prediction novel revamp of using DPCM for DWT coefficients [15]. In order to improve the accuracy of method based on the autoregressive integrated moving average (ARIMA) model and discrete wavelet prediction, Huang, F proposed a novel ECG signal prediction method based on the autoregressive transform (DWT) [16]. Suhartono proposed a hybrid spatio-temporal model by combining the generalized integrated moving average (ARIMA) model and discrete wavelet transform (DWT) [16]. Suhartono space–time autoregressive with exogenous variable and recurrent neural network (GSTARX-RNN) for proposed a hybrid spatio-temporal model by combining the generalized space–time autoregressive space–time data forecasting with calendar variation effect [17]. with exogenous variable and recurrent neural network (GSTARX-RNN) for space–time data The pattern matching predictor has been studied for a long time with many results. Jacquet, forecasting with calendar variation effect [17]. P proposed a universal predictor based on pattern matching, called the sampled pattern matching The pattern matching predictor has been studied for a long time with many results. Jacquet, P (SPM), which is a modification of the Ehrenfeucht–Mycielski pseudorandom generator algorithm [18]. proposed a universal predictor based on pattern matching, called the sampled pattern matching Although SPM has a good prediction ability for periodic signals, it involves pattern matching. (SPM), which is a modification of the Ehrenfeucht–Mycielski pseudorandom generator algorithm The complexity of the algorithm is increased by the matching of the maximal suffix, which makes the [18]. Although SPM has a good prediction ability for periodic signals, it involves pattern matching. real-time performance of the algorithm worse. Feder, M proposed the FS predictor, which can calculate The complexity of the algorithm is increased by the matching of the maximal suffix, which makes the the minimum fraction of prediction errors [19]. The FS predictor mainly predicts binary sequences. real-time performance of the algorithm worse. Feder, M proposed the FS predictor, which can For multi- wide digital sequences, its complexity is high. calculate the minimum fraction of prediction errors [19]. The FS predictor mainly predicts binary Data signals generally have strong time redundancy, but there are more characters in the signal sequences. For multi-bit wide digital sequences, its complexity is high. and the distribution probability of characters is relatively uniform. If entropy coding or dictionary Data signals generally have strong time redundancy, but there are more characters in the signal compression is directly used to data signals, not only is the compression ratio low, but the and the distribution probability of characters is relatively uniform. If entropy coding or dictionary compression time is also long, which cannot meet the compression requirements. If we can group compression is directly used to compress data signals, not only is the compression ratio low, but the similar contexts together, the symbols that follow them are likely to be the same, and a very simple and compression time is also long, which cannot meet the compression requirements. If we can group efficient compression strategy can be exploited. Among context-based algorithms, the most famous is similar contexts together, the symbols that follow them are likely to be the same, and a very simple the partial matching prediction (PPM) algorithm, which was first proposed by Cleary and Witten [20] and efficient compression strategy can be exploited. Among context-based algorithms, the most in 1984. For time-series signals, the ARIMA model and RNN model can accurately predict, but their famous is the partial matching prediction (PPM) algorithm, which was first proposed by Cleary and algorithmWitten [20] complexity in 1984. is For high, time-series so they are signals, mostly the used ARIMA for time model series withand lowRNN real-time model can requirements accurately suchpredict, as air but temperature their algorithm and wind complexity speed. is Unfortunately, high, so they inare the mostly process used of for data time compression, series with massivelow real- datatime prediction requirements is required, such as and air thetemperature ARIMA model and wi andnd thespeed. RNN Unfortunately, model take a lotin ofthe time. process Therefore, of data theycompression, do not meet massive the real-time data prediction requirements is required of data, and compression the ARIMA [18 ,model19]. and the RNN model take a lot of time. Therefore, they do not meet the real-time requirements of data compression [18,19]. Appl. Sci. 2020, 10, 4918 3 of 14

To solve the problem of periodic signal compression, we propose an efficient lossless compression methodAppl. for Sci. periodic 2020, 10, x signalsFOR PEERbased REVIEW on the adaptive dictionary. According to the data value3 of of 15 the past, the dictionary model is used to predict the current data value. The prediction coding usually does not To solve the problem of periodic signal compression, we propose an efficient lossless directly code the signal, but codes the prediction error. Because the adaptive dictionary predictive compression method for periodic signals based on the adaptive dictionary. According to the data codingvalue concentrates of the past, most the dictionary of the data model in is a used small to pr amountedict the of current data, data it e ffvalue.ectively The prediction reduces thecoding amount of data tousually obtain does a high not compression directly code ratio.the signal, but codes the prediction error. Because the adaptive dictionary predictive coding concentrates most of the data in a small amount of data, it effectively 2. Algorithmreduces Introductionthe amount of data to obtain a high compression ratio.

2.1. Coding2. Algorithm Algorithm Introduction Introduction  As2.1. shown Coding in Algorithm Figure2 Introduction, it is assumed that a sequence s(k) (k = 1, 2, ... , n) is given. Each symbol s(k) belong to a finite digital sequence e(k) is the output of the coding process. In the periodic signal, As shown in Figure 2, it is assumed that a sequence {s()k } (k = 1, 2, …, n) is given. Each symbol there is a great correlation between the different periods of the signal; therefore, the multivariate s()k belong to a finite digital sequence ek() is the output of the coding process. In the periodic digital group of the periodic signal, such as the ternary group (s(k 2), s(k 1), s(k)), will have great signal, there is a great correlation between the different periods −of the signal;− therefore, the repeatability. When a periodic signal is predicted, the history information of the signal is important. multivariate digital group of the periodic signal, such as the ternary group (s(k−2),s(k−1),s(k)), will We builthave a dictionary great repeatability. to store When the historya periodic information signal is predicted, of the the signal. history Because information the of history the signal information is of the signalimportant. is rich. We built According a dictionary to theto store characteristics the history information of the periodic of the signal. signal, Because the adaptive the history dictionary is onlyinformation used to store of the the signal information is rich. According of the to abovethe characteristics ternary groups. of the periodic In this signal, way, the not adaptive only can the periodicdictionary signal canis only be used accurately to store the predicted, information but of the above size ofternary thedictionary groups. In this is way, not verynot only large can and it is the periodic signal can be accurately predicted, but the size of the dictionary is not very large and it easy to implement it in the project. In order to reduce the complexity of querying the dictionary, for a is easy to implement it in the project. In order to reduce the complexity of querying the dictionary, ternary digit group (s(k 2), s(k 1), s(k)), the first two digits s(k 2) and s(k 1) are used to store for a ternary digit group− (s(k−2)−,s(k−1),s(k)), the first two digits s(k−2) and− s(k−1) are used− to store the the two-dimensionaltwo-dimensional addressaddress of of s(k)s(k. )Although. Although this this is not is notunique unique mapping mapping and will and cause will memory cause memory conflicts,conflicts, the complexity the complexity of the ofdictionary the dictionary lookup lookup algorithm algorithm is is low, low, just justO (O1(1)).. When When aa memorymemory conflict is encountered,conflict is encountered, the current the ternary current group ternary directly group di overwritesrectly overwrites the previous the previous ternary ternary group. group. In In this way, the dictionarythis way, created the dictionary by us created will be by dynamic, us will bewhich dynamic, is adaptive.which is adaptive. Furthermore, Furthermore, the larger the larger the the dimension of the adaptivedimension dictionary, of the adaptive the lowerdictionary, the probabilitythe lower the ofprobability the memory of the conflicts. memory conflicts. However, However, as the as dictionary the dictionary dimension increases, the storage size of the dictionary will increase dramatically. The dimension increases, the storage size of the dictionary will increase dramatically. The storage size of storage size of the high-dimensional dictionary will be unacceptable to us. Therefore, the two- the high-dimensionaldimensional dictionary dictionary will be will our be best unacceptable choice. Before to introducing us. Therefore, the algorithm, the two-dimensional we introduce dictionarya will bedefinition: our best choice. Before introducing the algorithm, we introduce a definition: (1) We define sk(2)− , sk(1− ) as the first two adjacent digitals of s()k .

the first two adjacent digitals

s(k-2) Is the YES linear dictionary (s(k- prediction s(k-1) 2),s(k-1)) empty?

s(k) e(k)

NO axis predicted value s'(k) subtractor

s(k) axis s(k-2) coded inputcoded array

s(k-1) s'(k) array output coded

Dictionary current digital

FigureFigure 2. Principle2. Principle blockblock diagram diagram of ofthe the coding coding algorithm. algorithm. Appl. Sci. 2020, 10, 4918 4 of 14

(1) We define s(k 2), s(k 1) as the first two adjacent digitals of s(k). − − First, the two-dimensional dictionary D is created, whose size is n n. The coded digitals are then × stored in the adaptive dictionary. The position of the digital in the dictionary is the two-dimensional coordinate formed by the first two adjacent digitals s(k 2) and s(k 1). Then the current digital − − s(k), whose first two adjacent digitals make up the address (s(k 2), s(k 1)), is coded, and the − − predicted value s (k) of the current digital to be coded is read in the dictionary D (s(k 2), s(k 1)). 0 − − Then the current coded digital s(k) is subtracted from the predicted value s0(k) to obtain an error signal e(k) = s(k) s (k). The error signal e(k) is the coding output signal. Although a dictionary is generated − 0 during the coding process, the dictionary does not need to be transmitted with the output signal of the coding after the coding process is completed. The adaptive dictionary is discarded as soon as the coding process ends. For the coding process, there are two special cases, as follows:  (1) The first two digitals s(1), s(2) in the input array s(k) to be coded do not have the first two adjacent digitals. Therefore, the first two digitals cannot be coded, and they need to be directly output as they are. (2) In the early stage of the coding process, because the dictionary D is not complete, there may not be predicted values in the dictionary. According to the first two adjacent digitals, Equation (1) can be used as the predicted value of the current digital to be coded.

s0(k) = a s(k 2) + b s(k 1), (1) ∗ − ∗ −  where k is the position of the character to be coded in the input array s(k) , s0(k) is the predicted value of the current digital to be coded, s(k 1) and s(k 2) are the first two adjacent digitals of the current − − digital s(k) to be coded, and a and b are prediction constants. In the process of predictive coding, Equation (1) is used when the dictionary is incomplete. Therefore, the parameters a and b are fixed and can be used as general linear predictions. In the experiments in this paper, a = 1, b = 2. − Algorithm 1 is the step of coding algorithm, which works as follows:

Algorithm 1 steps of coding algorithm Input: coded input {s(k)} Output: coded output {e(k)} Initialize- dictionary D Initialize- address index k 0 ← while k<=n %n is the length of input array{s(k)} do k k+1 ← if k<3 then e(k) s(k) ← elseif D(s(k 2),s(k-1))=NULL − then s’(k) a*s(k 2)+b*s(k 1) ← − − else s’(k) D(s(k 2),s(k 1)) ← − − e(k) s(k)-s’(k) ← D(s(k 2),s(k 1)) s(k) − − ← end

This example is Algorithm 1, which is shown in Figure3. Assume that the input period data stream for the encoder is ‘2 3 1 2 3 1 2 3 1 2’. First, the first two digitals are read by the encoder. However, their first two adjacent digitals do not exist. Thus, they are directly output as they are (Figure3a). When the dictionary is not complete, the dictionary D (3,1) is empty. The coded data is predicted as 1 − by Equation (1). In Equation (1), a = 1, b = 2. Thus, the output value of the encoder is 3 (Figure3b). − When the dictionary is complete, the dictionary D (2,3) is not empty. The coded data is predicted as Appl. Sci. 2020, 10, 4918 5 of 14

Appl. Sci. 2020, 10, x FOR PEER REVIEW 5 of 15 1 by the dictionary. Thus, the output value of the encoder is 0 (Figure3c). When the dictionary is complete,dictionary theis complete, dictionary the is stilldictionary updated is still after updated each prediction after each (Figure prediction3d). Finally, (Figure the3d). data Finally, stream the outputdata stream by the output encoder by isthe ‘2 encoder 3 3 3 0 is 0 0‘2 0 3 0 − 0’.3 3 0 0 0 0 0 0’. −

Input: 2 3 1 2 3 1 2 3 1 2 Input: 2 3 1 2 3 1 2 3 1 2

1234 1234 Address axis 1 1 2 2 3 3 1 4 4 Address axis Output: 2 3 Output: 2 3 -3 3 (a) When k<3, the first two digitals be (b) When the dictionary is not complete, the direct output as they are coded data is predicted by Equation 1.

Input: 2 3 1 2 3 1 2 3 1 2 Input: 2 3 1 2 3 1 2 3 1 2

1234 1234 1 2 1 2 Theaddressofthepredicted value of the currently coded 2 3 2 3 digital 3 1 3 1 Last updated dictionary 4 4

Output: 2 3 -3 3 0 0 Output: 2 3 -3 3 0 0 0 0 0 0

(c) When the dictionary is complete, (d) When the dictionary is complete, the predicted value of the coded data the dictionary is still updated after is read directly from the dictionary. each prediction.

Figure 3. Example of the predictive coding algorithm based on the adaptive dictionary. Figure 3. Example of the predictive coding algorithm based on the adaptive dictionary. 2.2. Decoding Algorithm Introduction 2.2. DecodingAs shown Algorithm in Figure Introduction4, the decoding process is very similar to the coding process, it is assumed that  a sequenceAs showne(k) in( kFigure= 1, 2, 4,... the, n) decoding is given. process Each symbol is verye (similark) belongs to the to coding a finite process, digital. Theit is sequenceassumed uthat(k) isa sequence the output { ofek() the} decoding(k = 1, 2, process.…, n) is given. Each symbol ek() belongs to a finite digital. The First, the two-dimensional dictionary H is created, whose size is the same as the dictionary D sequence uk() is the output of the decoding process. of the coding process. The decoding output u(k) of e(k) is stored in the dictionary. The position of First, the two-dimensional dictionary H is created, whose size is the same as the dictionary D the digital u(k) in the dictionary is the two-dimensional coordinate formed by the first two adjacent of the coding process. The decoding output uk() of ek() is stored in the dictionary. The position of digitals u(k 2) and u(k 1). When the current decoded digital e(k) is decoded, the predicted value the digital −uk() in the dictionary− is the two-dimensional coordinate formed by the first two adjacent u0(k) of the decoded digital is read in the dictionary with the address coordinate (u(k 2), u(k 1)) of − − − − thedigitals first twouk(2) adjacent and digitals uk(1)u. (Whenk). Then the the current digital decodede(k) to be digital decoded ek() and is the decoded, predicted the value predictedu0(k) arevalue added uk'( to ) getof the a signal decodedu(k) digital = e(k) +is uread0(k), in which the dictionary is the decoded with output the address signal. coordinate When the decoding( uk(2)− , processuk(1)− ) ends, of the the first adaptive two adjacent dictionary digitals isstill uk() discarded.. Then the digital ek() to be decoded and the predicted valueFor uk the'( ) decoding are added process, to get a this signal method uk() has=+ ek () two specialu '() k , which cases, is as the follows: decoded output signal. When  (1)the decodingThe first process two digitals ends,e (the1), adaptie(2) inve the dictionary input array is stille(k discarded.) to be decoded do not have the first two adjacent digitals. Therefore, the first two digitals cannot be decoded, and they need to be directly output as they are. (2) In the early stage of the decoding process, because the dictionary H is not complete, there may not be predicted values in the dictionary. According to the first two adjacent digitals of the decoded output u(k), Equation (2) can be used as the predicted value of the current digital to be encoded. Appl. Sci. 2020, 10, 4918 6 of 14 Appl. Sci. 2020, 10, x FOR PEER REVIEW 6 of 15

the first two adjacent digitals of the output

u(k-2) Is the YES dictionary (u(k- linear u(k-1) 2),u(k-1)) prediction empty? u(k) e(k)

NO axis predicted value u’(k) adder

u(k) axis u(k-2) decoded output array decoded output

u(k-1) u’(k) decoded input array

Dictionary current decoded digital

FigureFigure 4. 4. PrinciplePrinciple block block diagram diagram of of the the decoding decoding algorithm. algorithm.

For the decoding process, this umethod0(k) = ahasu (twok 2 special) + b ucases,(k 1 as), follows: (2) ∗ − ∗ − (1)where Theu 0first(k) istwo the digitals predicted e(1) value, e(2) of inthe the current input digitalarray to{ek() be} decoded, to be decodedu(k 1 do) and notu have(k 2 the) are first the − − a b firsttwo two adjacent adjacent digitals. digitals ofTherefore, the output the of first the currenttwo digitals decoded cannot digital. be decoded,and are and constant they need prediction to be coeffidirectlycients, output which areas they the sameare. as the prediction Equation (1) coefficient in the coding process. (2) InAlgorithm the early stage 2 is the of stepthe decoding of decoding process, algorithm, because which the dictionary works as follows: H is not complete, there may not be predicted values in the dictionary. According to the first two adjacent digitals of the Algorithmdecoded 2 outputsteps of decodinguk(), Equation algorithm (2) can be used as the predicted value of the current digital to Input:be encoded. decoded input {e(k)} Output: decoded output {u(k)} Initialize- dictionary H =−+− Initialize- address index k 0 uk'( ) auk * ( 2) buk * ( 1) , (2) ← while k<=n %n is the length of input array{s(k)} where uk'( ) is the predicted value of the current digital to be decoded, uk(1)− and uk(2)− are do k k+1 ← theif firstk<3 two adjacent digitals of the output of the current decoded digital. a and b are constant predictionthen u(k) coefficients,e(k) which are the same as the prediction Equation (1) coefficient in the coding ← process.elseif H(u(k 2),u(k-1)) = NULL − then u’(k) a*u(k 2)+b*u(k 1) Algorithm← 2 −is the step− of decoding algorithm, which works as follows: else u’(k) H(u(k 2),u(k 1)) ← − − u(k) e(k)+u’(k) Algorithm← 2 steps of decoding algorithm H(u(k 2),u(k 1)) u(k) − − ← Input:end decoded input {e(k)} Output: decoded output {u(k)} Initialize- dictionary H Initialize- address index k← 0 whileThe k<=n example is Algorithm %n is the 2, length which of is input shown array{e(k)} in Figure 5. Assume that the input data stream for the decoder is ‘2 3 3 3 0 0 0 0 0 0’. First, the first two digitals are read by the decoder. However, do k← k+1 − their firstif twok<3 adjacent digitals do not exist. Thus, they are directly output as they are (Figure5a). When the dictionarythen u(k)←is e(k) not complete, the dictionary H (3,1) is empty. The decoded data is predicted as 1 by Equation (2). In the Equation (2), a = 1, b = 2. Thus, the output value of the decoder is 2 − elseif H(u(k−2),u(k−1)) =NULL − (Figure5b). Whenthen the u’(k) dictionary← a*u(k-2)+b*u(k is complete,−1) the dictionary H (2,3) is not empty. The decoded data else u’(k)← H(u(k−2),u(k−1)) u(k)← e(k)+u’(k) Appl.Appl. Sci.Sci. 20202020,, 1010,, xx FORFOR PEERPEER REVIEWREVIEW 77 ofof 1515

H(u(kH(u(k−−2),u(k2),u(k−−1))1))←← u(k)u(k) endend

TheThe exampleexample isis AlgorithmAlgorithm 2,2, whichwhich isis shownshown inin FigureFigure 5.5. AssumeAssume thatthat thethe inputinput datadata streamstream forfor thethe decoderdecoder isis ‘2‘2 33 −−33 33 00 00 00 00 00 0’.0’. First,First, thethe firstfirst twotwo digitalsdigitals areare readread byby thethe decoder.decoder. However,However, theirtheir firstfirst twotwo adjacentadjacent digitalsdigitals dodo notnot exist.exist. Thus,Thus, theythey areare directlydirectly outputoutput asas theythey areare (Figure(Figure 5a).5a). WhenWhen Appl.thethe dictionarydictionary Sci. 2020, 10, 4918isis notnot complete,complete, thethe dictionarydictionary HH (3,1)(3,1) isis empty.empty. TheThe decodeddecoded datadata isis predictedpredicted 7 asas of − 14−11 byby EquationEquation (2).(2). InIn thethe EquationEquation (2),(2), aa == −−1,1, bb == 2.2. Thus,Thus, thethe outputoutput valuevalue ofof thethe decoderdecoder isis 22 (Figure(Figure 5b).5b). WhenWhen thethe dictionarydictionary isis complete,complete, thethe dictiodictionarynary HH (2,3)(2,3) isis notnot empty.empty. TheThe decodeddecoded datadata isis ispredictedpredicted predicted asas as 11 1byby by thethe the dictionary.dictionary. dictionary. Thus,Thus, Thus, thethe the outputoutput output valuevalue value ofof of thethe decoder is is 11 (Figure(Figure(Figure5 5c).c).5c). When WhenWhen the thethe dictionarydictionarydictionary is isis complete, complete,complete, the thethe dictionary dictionarydictionary is stillisis stillstill updated updatedupdated after afterafter each eacheach prediction predictionprediction (Figure (Figure(Figure5d). Finally, 5d).5d). Finally,Finally, the data thethe streamdatadata streamstream output outputoutput by the byby decoder thethe decoderdecoder is ‘2 3 isis 1 ‘2 2‘2 3 33 1 11 2 22 3 33 1 11 2’. 22 33 11 2’.2’.

Input:Input: 22 33 -3 -3 33 00 00 00 00 00 00 Input:Input: 22 33 -3-3 33 0 0 00 00 00 00 00 Address axis 12341234 12341234 Address axis 11 11 22 22 33 33 11 44 44 AddressAddress axisaxis Output:Output: 22 33 Output:Output: 22 33 11 22 (a)(a) WhenWhen k<3,k<3, thethe firstfirst twotwo digitalsdigitals bebe (b) When the dictionary is not complete, the direct output as they are (b) When the dictionary is not complete, the direct output as they are decodeddecoded datadata isis predictedpredicted byby EquationEquation 2.2.

Input:Input: 22 33 -3-3 33 00 00 0 0 00 00 00 Input:Input: 22 33 -3-3 33 00 00 00 00 00 00

12341234 12341234 The address of the predicted 11 22 11 22 The address of the predicted valuevalue of of the the currently currently decoded decoded 22 33 22 33 digitaldigital Last updated dictionary 33 11 33 11 Last updated dictionary 44 44

Output:Output: 22 33 11 22 33 11 Output:Output: 22 33 11 22 33 11 22 33 11 22 (c) When the dictionary is complete, (c) When the dictionary is complete, (d)(d) WhenWhen thethe dictionarydictionary isis complete,complete, thethe predictedpredicted valuevalue ofof thethe decodeddecoded the dictionary is still updated after data is read directly from the the dictionary is still updated after data is read directly from the eacheach prediction.prediction. dictionary.dictionary.

Figure 5. Example of predictive decoding algorithm based on adaptive dictionary. FigureFigure 5.5. ExampleExample ofof predictivepredictive decodingdecoding algorithmalgorithm basedbased onon adaptiveadaptive dictionary.dictionary. 3. Experiment and Result Analysis 3.3. ExperimentExperiment andand ResultResult AnalysisAnalysis 3.1. Experiment 3.1. Experiment 3.1. ExperimentIn the experiment, we use MATLAB2014b as an algorithm verification tool. Our process of compressingInIn thethe experiment,experiment, the signal is wewe shown useuse inMATLAB2014bMATLAB2014b Figure6. asas anan algorithmalgorithm verificationverification tool.tool. OurOur processprocess ofof compressingcompressing thethe signalsignal isis shownshown inin FigureFigure 6.6.

FigureFigureFigure 6. 6.6. TheThe blockblockblock diagramdiagramdiagram ofofof a aa compression compressioncompression system. system.system. The lossless compression system does not include ‘data quantization’, which is a step of a lossy TheThe losslesslossless compressioncompression systemsystem doesdoes notnot includeinclude ‘d‘dataata quantization’,quantization’, whichwhich isis aa stepstep ofof aa lossylossy compression system. First, the test signal is processed by DC level shifting. Then, the transform compressioncompression system.system. First,First, thethe testtest signalsignal isis prprocessedocessed byby DCDC levellevel shifting.shifting. Then,Then, thethe transformtransform coding or the predictive coding is used to process the signal. Finally, LZW is used to further compress the signal. The calculation formula of DC level shifting is

y = x x , (3) − min where x is compressed data and xmin is the minimum value of the compressed data. The experiment contains a total of three sets of data: Appl. Sci. 2020, 10, 4918 8 of 14

(1) The first set of data is a periodic signal that changes period and amplitude with time to verify the feasibility of a lossless compression system. (2) The second set of data is a periodic signal with a period T = 200 and a signal length of L = 10 K. It is used to verify the effect of block size on compression efficiency. The formula of the signal is

2π 2π y = 80z + 50 sin( x) + 10 sin( x), (4) 50 20 ( 8 (x 200 n) 200n x < 200n + 180 z = 9 − ∗ ≤ , (5) 8(x 200 (n + 1)) 200n + 180 x < 200n + 200 − − ∗ ≤ where n = 0, 1, ... , 50. (3) The last set of data we selected is 10 different periodic signals, whose periods and amplitudes are different. Because the wavelet basis function has rich information, we perform periodic extension of the wavelet basis function to obtain the above-mentioned 10 different sets of periodic signals. The detailed information of these 10 sets of periodic signals is shown in Table1.

Table 1. Experimental signal information.

Wavelet Period Length Digital Width (Bit) bior2.4 320 102.4 K 12 coif3 272 74 K 12 db4 224 50 K 12 dmey 200 40 K 12 haar 256 65.5 K 12 mexihat 256 65.5 K 12 meyer 256 65.5 K 12 morlet 256 65.5 K 12 rbio2.4 288 82.9 K 12 sym4 224 50 K 12

In [10,11], the authors studied the data compression algorithm based on the two-dimensional lifting format wavelet transform (2-D LWT) and two-dimensional discrete cosine transform (2-D DCT) in order to improve the and the efficiency of the compression algorithm. The basic idea is to first convert the periodic signal to be compressed from one-dimensional space to two-dimensional space, and then perform two-dimensional wavelet decomposition and two-dimensional discrete cosine transform, which can improve compression efficiency. The process of two-dimensional expression of one-dimensional data is as follows:

(1) Segment the one-dimensional periodic signal along the period of the signal. (2) Arrange the cut signal into a two-dimensional signal.

In Figure7, the process of two-dimension expression of one-dimension data is shown. In the experiment, we chose several methods that currently have a good compression effect on periodic signals. They are two-dimensional discrete cosine transform (2-D DCT) [10], two-dimensional lifting wavelet transform (2-D LWT) [11] and differential pulse code modulation (DPCM) [13]. In addition, we used LZW to further compress the encoded output. Appl. Sci. 2020, 10, 4918 9 of 14 Appl. Sci. 2020, 10, x FOR PEER REVIEW 9 of 15

Figure 7. The process of two-dimensional expression of one-dimensional data. Figure 7. The process of two-dimensional expression of one-dimensional data. The workflow of these three methods is as follows: In the experiment, we chose several methods that currently have a good compression effect on In reference [10], 2-D DCT works as follows: periodic signals. They are two-dimensional discrete cosine transform (2-D DCT) [10], two- (1)dimensionalSegment 1-Dlifting signal wavelet along transform with multiples (2-D ofLWT) the signal’s [11] and period. differential pulse code modulation (2)(DPCM)Arrange [13]. the In addition, signal into we a 2-Dused signal. LZW to further compress the encoded output. (3) DivideThe workflow the 2-D signalof these into three 8 8methods matrix blocks. is as follows: × (4) PerformIn reference 2D discrete [10], 2-D cosine DCT transformworks as follows: on 8 8 matrix blocks. × (5)(1) QuantizeSegment the1-D transformedsignal along output.with multiples of the signal’s period. (2) Arrange the signal into a 2-D signal. In reference [11], 2-D LWT works as follows: (3) Divide the 2-D signal into 8×8 matrix blocks. (1)(4) SegmentPerform 1-D2D discrete signal along cosine with transform multiples on of8×8 the matrix signal’s blocks. period. (2)(5) ArrangeQuantize the the signal transformed into a 2-D output. signal. (3) PerformIn reference 3-layer [11], lifting 2-D LWT wavelet works transform as follows: on 2-D signal. (4) Quantize the transformed output. (1) Segment 1-D signal along with multiples of the signal’s period. (2) DPCMArrange works the signal as follows: into a 2-D signal. (3) Perform 3-layer lifting wavelet transform on 2-D signal. (1) The formula for predicting the actual value x(k) is (4) Quantize the transformed output.

DPCM works as follows: XN x0(k) = a(i)x(k i), (6) − (1) The formula for predicting the actual value i=x1()k is

N (2) The calculation formula of the outputx '( resultkaixki )=− signal ( ) ( is ) , (6) i=1

e(k) = x(k) x0(k), (7) (2) The calculation formula of the output result signal− is =− where x0(k) is the predicted value of theek encoded() xk () number x '() k , x(k), and a(i) is the prediction coefficient.(7)

3.2.where Experimental x '(k ) is Evaluation the predicted Index value of the encoded number x()k , and ai() is the prediction coefficient. In the experiment, the compression ratio (CR) and information entropy were selected as the evaluation index. In addition, the peak signal to noise ratio (PSNR) is an objective standard for 3.2. Experimental Evaluation Index evaluating data compression. TheIn the entropy experiment, feature isthe a measurecompression of the ratio uncertainty (CR) and of information a random variable entropy [ 21were]. The selected calculation as the formulaevaluation of the index. average In addition, information the entropy peak signal is to noise ratio (PSNR) is an objective standard for evaluating data compression. n The entropy feature is a measure of theX uncertainty of a random variable [21]. The calculation H(X) = p(x(i)) log p(x(i)), (8) formula of the average information entropy− is 2 i=1 n =− HX() pxi (())log(())2 pxi , (8) i=1 Appl. Sci. 2020, 10, 4918 10 of 14 where X is the source, x(i) is the character in the source, p(x(i)) is the probability that the character x(i) appears in the source X, and the unit of H(X) is bit/sign.  Equation (8) is the required sequence x(i) as a discrete memoryless source. The method proposed in this paper is a prediction method, whose output is the prediction errors. The prediction error sequences are independent of each other, so it is memoryless. The amount of information in the predicted output sequence can be measured by the average information entropy, which can reflect the compressibility of the sequence. The calculation formula of PSNR is:        MAX2  PSNR = 10 log  , (9) 10 2  •  L   1 P   L [x(i) x0(i)]  i=1 − where x is the original signal, x0 is the signal after adding noise, L is the length of the signal, MAX is the maximum value of the signal, and PSNR is measured in dB. The calculation formula for the compression ratio is:

S M = in , (10) Sout where Sin is the byte size before data compression, and Sout is the byte size after data compression.

3.3. Result Analysis In order to verify the efficiency and adaptability of the method proposed in this paper, we used the method to predict and code the periodic signal, which has a varying period and amplitude. Figure8 verifies the e fficiency and adaptability of the method proposed in this paper. We observed that the amplitude and period of the periodic signal change, which has little effect on the prediction accuracy of this method. This effect only occurs in the first period of amplitude and period changes. Appl.In addition, Sci. 2020, 10 the, x FOR output PEER of REVIEW the coding has a strong “energy concentration” characteristic, which11 of has 15 many 0 values. Furthermore, the method proposed in this paper is a lossless compression algorithm.

Figure 8. (a) Original signal (b) Coding output (c) Decoding output (d) Error between the decoding output and the original signal. Figure 8. (a) Original signal (b) Coding output (c) Decoding output (d) Error between the decoding output and the original signal.

Figure 9. (a) The signals containing additive noise (b) Coding output (c) Decoding output (d) Error between the decoding output result and the original signal.

Then, the period signal with period T = 200 is selected. The signal was cut to different sizes (L = 1, 2, …, 10 K), which were coded by the method proposed in this paper. The entropy of the coding output were calculated by Formula (8). By comparing the results in Figure 10, we can know that the larger the compressed data size, the better the predictive coding effect of this method. In addition, when the size of the compressed data Appl. Sci. 2020, 10, x FOR PEER REVIEW 11 of 15

Appl. Sci. 2020, 10, 4918 11 of 14

Next, the original signal in Figure8 was the added noise, whose signal to noise ratio is 34.2 dB. In Figure9, the prediction e ffect of this algorithm on the noise-added periodic signal, whose prediction effect is worse than the prediction effect in the Figure8, is shown. This is because the method proposed in this paper is a lossless compression algorithm. We added the noise to the periodic signal, and the amplitudeFigure 8. of (a the) Original signal signal is random (b) Coding within output a certain (c) Decoding range. Itoutput is di ffi(dcult) Error for between the prediction the decoding model to use contextoutput and information the original to signal. accurately predict the amplitude of the signal, so the output prediction residuals fluctuate within a small range. Therefore, the signal in Figure9 is also the next in-depth research direction of this algorithm.

Figure 9. (a) The signals containing additive noise (b) Coding output (c) Decoding output (d) Error between the decoding output result and the original signal. Figure 9. (a) The signals containing additive noise (b) Coding output (c) Decoding output (d) Error betweenThen, the the perioddecoding signal output with result period and the T original= 200 is signal. selected. The signal was cut to different sizes (L = 1, 2, ... , 10 K), which were coded by the method proposed in this paper. The entropy of the codingThen, output the period were calculated signal with by period Equation T = (8).200 is selected. The signal was cut to different sizes (L = 1, 2, …,By 10 comparing K), which the were results coded in by Figure the method10, we canproposed know thatin this the paper. larger The the entropy compressed of the data coding size, output were calculated by Formula (8). the better the predictive coding effect of this method. In addition, when the size of the compressed By comparing the results in Figure 10, we can know that the larger the compressed data size, the data is large enough, the information entropy no longer decreases, and the information entropy at this better the predictive coding effect of this method. In addition, when the size of the compressed data time reaches the limit. Finally, we selected 10 sets of sample signals that have different periods and amplitudes. The different methods, including the methods of this paper, 2-D DCT, 2-D LWT, and DPCM were selected to code 10 sets of periodic signals. In addition, LZW was used to further compress the signal. By comparing the results in Figure 11 and Table2, we can see that the coding e ffect of this paper’s method on periodic signals is much better than the other three methods. When the coding output of the method proposed in this paper is compressed by LZW, the compression ratio is much higher than the other three methods. Furthermore, the method proposed in this paper belongs to lossless compression. When the signal is compressed by the method, the signal information will not be lost. Appl. Sci. 2020, 10, x FOR PEER REVIEW 12 of 15 is large enough, the information entropy no longer decreases, and the information entropy at this time reaches the limit. 6 5.5 5 4.5 4 3.5 Appl. Sci. 2020, 10, x FOR PEER REVIEW 12 of 15 3 Appl. Sci. 2020, 10, 49182.5 12 of 14 is large enough, the information entropy no longer decreases, and the information entropy at this 2 time reaches the limit. Entropy (Bit/sign) Entropy 1.5 61 5.50.5 50 4.5 1 100 10,000 4 Size of compressed data (in 12 bit bytes) 3.5 3 Figure 10. The effect of block size on compression efficiency. 2.5 Finally, we selected2 10 sets of sample signals that have different periods and amplitudes. The

Entropy (Bit/sign) Entropy 1.5 different methods, including the methods of this paper, 2-D DCT, 2-D LWT, and DPCM were selected 1 to code 10 sets of periodic0.5 signals. In addition, LZW was used to further compress the signal. By comparing the0 results in Figure 11 and Table 2, we can see that the coding effect of this paper’s method on periodic signals1 is much better 100 than the other three 10,000 methods. When the coding output of the method proposed in this paper is compressed by LZW, the compression ratio is much higher than Size of compressed data (in 12 bit bytes) the other three methods. Furthermore, the method proposed in this paper belongs to lossless compression. When theFigure signal 10. is TheThecompressed effect effect of block by the size method, on compression the signal efficiency. effi ciency.information will not be lost.

Finally, we selected 10 sets of sample signals that have different periods and amplitudes. The different120 methods, including the methods of this paper, 2-D DCT, 2-D LWT, and DPCM were selected to code 10 sets of periodic signals. In addition, LZW was used to further compress the signal. 100 By comparing the results in Figure 11 and Table 2, we can see that the coding effect of this paper’s method on periodic signals is much better than the other three methods. When the coding output of 80 the method proposed in this paper is compressed by LZW, the compression ratio is much higher than the other60 three methods. Furthermore, the method proposed in this paper belongs to lossless compression. When the signal is compressed by the method, the signal information will not be lost. 40 Compression ratio 120 20 100 0 80 bior24 coif3 db4 dmey haar mexihat meyer morlet rbio24 sym4 Different period signals 60 Proposed method 2-D DCT 2-D LWT DPCM

40 Figure 11. Comparison of the compression ratio of the signal compression by LZW. Compression ratio Figure 11. Comparison of the compression ratio of the signal compression by LZW.

Table20 2.2. ComparisonComparison of the compression ratio of the methodsmethods proposed in this paper with 2-D 2-D DCT, DCT, 2-D LWT, andand DPCM.DPCM. 0 bior24CodingCoding coif3 MethodsMethods db4 CRmean dmey CRmean haarCRmin CRmin mexihat CRmax CRmax meyer PSNR morlet (dB)(dB) rbio24 sym4 Proposed method 64.90Different 32.11period signals 129.77 - Proposed methodProposed method 64.902-D DCT 32.112-D LWT129.77 DPCM - 2-D DCTDCT 46.78 46.78 32.72 32.72 98.9298.92 63.77 2-D LWTLWT 20.78 20.78 14.87 14.87 25.2225.22 56.07 Figure 11. ComparisonDPCM of the compression 21.00 ratio 11.31 of the signal50.63 compression- by LZW. CRmean is the average of the compression ratio of 10 sets of data value, CRmin is the minimum value of 10 sets of Tabledata compression 2. Comparison ratio, CR ofmax the iscompression the maximum ratio value of of the 10 setsme ofthods data proposed compression in ratio. this Inpaper addition, with the 2-D encoded DCT, 2-Doutput LWT, results and of DPCM. 2-D DCT and 2-DLWT are quantized, so they belong to . We calculated the compression ratio by Equation (10). PSNR is the peak signal to noise ratio of the output signal. PSNR is the average of the peak signal to noise ratio of 10 sets of data value. When the value of PSNR is “-”, it represents lossless compression. WeCoding calculated MethodsPSNR by Equation CRmean (9). CRmin CRmax PSNR (dB) Proposed method 64.90 32.11 129.77 - 2-D DCT 46.78 32.72 98.92 63.77 The time complexity of our proposed method, DPCM, 2-D DCT, and 2-D LWT are presented in 2-D LWT 20.78 14.87 25.22 56.07 Table3. The time complexity of our proposed method is the same as that of DPCM and 2-D LWT, DPCM 21.00 11.31 50.63 - and is simpler than that of 2-D DCT. Appl. Sci. 2020, 10, 4918 13 of 14

Table 3. Time complexity of different methods.

Coding Methods Time Complexity Proposed method O(n) DPCM O(n) 2-D DCT O(n2) 2-D LWT O(n)

4. Conclusions In this paper, aiming at the problem of periodic signal compression, a new method was proposed. Its coding output is compressed by LZW. Experimentally, we can verify that the advantages of our proposed method are:

(1) The results obtained proved that the output of the coding has a strong “energy concentration” characteristic, which has many 0 values. (2) Our proposed method has a strong adaptive ability, the amplitude and period of the periodic signal change, which has little effect on the prediction accuracy of this method. (3) By comparing with 2-D DCT, 2-D LWT, and DPCM, our proposed method is more effective in compressing periodic signals. It is combined with LZW to compress the periodic signal and has a high compression ratio. (4) The complexity of the method proposed in this paper, which is O(n), is low.

Author Contributions: Writing: S.D.; Formal Analysis: S.D.; Software: S.D.; Validation: S.D.; Visualization: S.D.; Funding acquisition: W.L., K.L. and P.W.; Project administration: W.L., K.L. and P.W.; Data curation: Z.W. and P.Z.; Supervision: W.L. All authors were involved in experimental investigations. All authors have read and agreed to the published version of the manuscript. Funding: This research was funded by The National Key Research and Development Program of China under Grant 2018YFB2003304 and the Foundation of Graduate Innovation Center in NUAA under Grant kfjj20190306 and the Fundamental Research Funds for the Central Universities. Conflicts of Interest: The authors declare no conflict of interest.

References

1. Tripathi, R.P.; Mishra, G.R. Study of various data compression techniques used in lossless compression of ECG signals. In Proceedings of the 2017 International Conference on Computing, Communication and Automation (ICCCA), Greater Noida, India, 5–6 May 2017; pp. 1093–1097. [CrossRef] 2. Gao, X. On the improved correlative prediction scheme for aliased electrocardiogram (ECG) data compression. In Proceedings of the 2012 Annual International Conference of the IEEE Engineering in Medicine and Biology Society, San Diego, CA, USA, 28 August–1 September 2012; pp. 6180–6183. [CrossRef] 3. Guo, L.; Zhou, D.; Goto, S. Lossless embedded compression using multi-mode DPCM & averaging prediction for HEVC-like . In Proceedings of the 21st European Signal Processing Conference (EUSIPCO 2013), Marrakech, Morocco, 9–13 September 2013; pp. 1–5. 4. Arshad, R.; Saleem, A.; Khan, D. Performance comparison of Huffman Coding and Double Huffman Coding. In Proceedings of the 2016 Sixth International Conference on Innovative Computing Technology (INTECH), Dublin, Ireland, 24–26 August 2016; pp. 361–364. [CrossRef] 5. Huang, J.-Y.; Liang, Y.-C.; Huang, Y.-M. Secure integer with adjustable interval size. In Proceedings of the 2013 19th Asia-Pacific Conference on Communications (APCC), Denpasar, India, 29–31 August 2013; pp. 683–687. [CrossRef] 6. Khosravifard, M.; Narimani, H.; Gulliver, T.A. A Simple Recursive Shannon Code. IEEE Trans. Commun. 2012, 60, 295–299. [CrossRef] 7. Hou, A.-L.; Yuan, F.; Ying, G. QR code image detection using run-length coding. In Proceedings of the 2011 International Conference on Computer Science and Network Technology, Harbin, China, 24–26 December 2011; pp. 2130–2134. [CrossRef] Appl. Sci. 2020, 10, 4918 14 of 14

8. Nishimoto, T.; Tabei, Y. LZRR: LZ77 Parsing with Right Reference. In Proceedings of the 2019 Data Compression Conference (DCC), Snowbird, UT, USA, 26–29 March 2019; pp. 211–220. [CrossRef] 9. Nandi, U.; Mandal, J.K. A Compression Technique Based on Optimality of LZW Code (OLZW). In Proceedings of the 2012 Third International Conference on Computer and Communication Technology, Allahabad, India, 23–25 November 2012; pp. 166–170. [CrossRef] 10. An, Q.; Zhang, H.; Hu, Z.; Chen, Z. A Compression Approach of Power Quality Monitoring Data Based on Two-dimension DCT. In Proceedings of the 2011 Third International Conference on Measuring Technology and Mechatronics Automation, Shanghai, China, 6–7 January 2011; pp. 20–24. 11. Zhang, R.; Yao, H.; Zhang, C. Compression method of power quality data based on wavelet transform. In Proceedings of the 2013 2nd International Conference on Measurement, Information and Control, Harbin, China, 16–18 August 2013; pp. 987–990. 12. Tsai, T.; Tsai, F. Efficient Lossless Compression Scheme for Multi-channel ECG Signal. In Proceedings of the ICASSP 2019—2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, 12–17 May 2019; pp. 1289–1292. 13. Alam, S.; Gupta, R. A DPCM based Electrocardiogram coder with thresholding for real time telemonitoring applications. In Proceedings of the 2014 International Conference on Communication and Signal Processing, Melmaruvathur, India, 3–5 April 2014; pp. 176–180. 14. Liu, B.; Cao, A.; Kim, H. Unified Signal Compression Using Generative Adversarial Networks. In Proceedings of the ICASSP 2020—2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 4–9 May 2020; pp. 3177–3181. [CrossRef] 15. Wang, I.; Ding, J.; Hsu, H. Prediction techniques for wavelet based 1-D signal compression. In Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Kuala Lumpur, Malaysia, 12–15 December 2017; pp. 23–26. [CrossRef] 16. Huang, F.; Qin, T.; Wang, L.; Wan, H.; Ren, J. An ECG Signal Prediction Method Based on ARIMA Model and DWT. In Proceedings of the 2019 IEEE 4th Advanced Information Technology, Electronic and Automation Control Conference (IAEAC), Chengdu, China, 20–22 December 2019; pp. 1298–1304. [CrossRef] 17. Hikmawati, F.; Setyowati, E.; Salehah, N.A.; Choiruddin, A. A novel hybrid GSTARX-RNN model for forecasting space-time data with calendar variation effect. J. Phys. Conf. Ser. Kota Ambon Indones. 2020, 1463, 012037. [CrossRef] 18. Jacquet, P.; Szpankowski, W.; Apostol, I. A universal pattern matching predictor for mixing sources. In Proceedings of the IEEE International Symposium on , Lausanne, Switzerland, 30 June–5 July 2002; p. 150. [CrossRef] 19. Feder, M.; Merhav, N.; Gutman, M. Universal prediction of individual sequences. IEEE Trans. Inf. Theory 1992, 38, 1258–1270. [CrossRef] 20. Cleary, J.G.; Witten, I.H. Data compression using adaptive coding and partial string matching. IEEE Trans. Commun. 1984, 32, 396–402. [CrossRef] 21. Cover, T.M.; Thomas, J.A. Elements of Information Theory; John Wiley & Sons: Hoboken, NJ, USA, 2012.

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).