Open Access in Geophysics Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Nonlinear Processes

Open Access Open Access Open Access Open Access Open Access Open Access Open Access Open Access Open Access Open Access Open Access Open Access

Open Access Discussions Discussions Discussions Discussions Discussions Discussions Discussions Discussions Discussions Discussions Discussions Discussions

Climate Sciences Sciences Dynamics Chemistry of the Past Solid Earth Techniques Geoscientific Methods and Methods and Physics Atmospheric Atmospheric Data Systems Data Geoscientific Earth System Earth Earth System Earth System Measurement Instrumentation Instrumentation Hydrology and and Hydrology Ocean Science Annales Biogeosciences The Cryosphere Natural Hazards and Earth System Model Development Model Geophysicae 1

Open Access Open Access Open Access Open Access Open Access Open Access Open Access Open Access Open Access Open Access Open Access Open Access Open Access

2030 2029 culties in answering the question: “How much ffi Climate , and M. B. Parlange Sciences Sciences 2 Dynamics Chemistry of the Past Solid Earth Techniques Geoscientific Methods and Methods and Physics Advances in Atmospheric Atmospheric Data Systems Data Geoscientific Geosciences Earth System Earth System Earth System Earth Measurement Instrumentation Instrumentation Hydrology and and Hydrology Ocean Science Biogeosciences The Cryosphere Natural Hazards EGU Journal Logos (RGB) and Earth System and Earth Model Model Development , N. van de Giesen 1 ´ erale de Lausanne, Station 2, 1015 Lausanne, Switzerland Quantifying information content of hydrological data is closely linked to the question We discuss both the deeper foundation from algorithmic information theory, some The conclusion is that the answer to this question can only be given once the fol- ´ ed This discussion paper is/has been under review for the journal Hydrology and Earth System Sciences (HESS). Please refer to the corresponding final paper in HESS if available. Water resources management, Delft University of Technology, Stevinweg 1, P.O. Box 5048, Environmental fluid mechanics and Hydrology Lab, ENAC, Ecole Polytechnique ten explicitly asked, but is actuallyand underlying many monitoring. challenges Information in content hydrological modeling of time series is for example relevant for decisions model performance, as addressed intive teaches current us hydrological that literature. it Theing is AIT prior impossible perspec- to beliefs. answer These this beliefsaccept question are as objectively, related without a specify- to law the and what maximum is complexity one considered as is random. willing to 1 Introduction How much information is contained in hydrological time series? This question is not of- lowing counter-questions have beenquantities? answered: (2) (1) What Information is about your which current unknown state of knowledge/beliefs aboutof those separating quantities? aleatoric and epistemic uncertainty and quantifying maximum possible the information-theoretical framework, theretent is and a data strong compression. link Wea exploit between this time information by series con- using analysis data tooland compression and learning performance highlight (understanding as the is analogy compression).of The to analysis a information is set content, performed of prediction, on catchments, time searching series for the mechanisms behindpractical compressibility. results andinformation the is inherent contained in di this data?”. When inferring models frommight hydrological be data interested in or thepotentially calibrating information be hydrological content learned models, of from those them. we information In data theory this to (AIT) work quantify to we how discuss take much some a can underlying perspective issues from regarding (algorithmic) this question. In Abstract to define information content of hydrological time series S. V. Weijs Hydrol. Earth Syst. Sci. Discuss.,www.hydrol-earth-syst-sci-discuss.net/10/2029/2013/ 10, 2029–2065, 2013 doi:10.5194/hessd-10-2029-2013 © Author(s) 2013. CC Attribution 3.0 License. 2 2600 GA, Delft, The Netherlands Received: 31 January 2013 – Accepted: 6Correspondence February to: 2013 S. – V. Weijs Published: (steven.weijs@epfl.ch) 14 FebruaryPublished 2013 by Copernicus Publications on behalf of the European Geosciences Union. 1 F 5 25 15 20 10 Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | ); avail- Vrugt Weijs Singh , code 1948 1 4 ; ( 1 ), we know 1993 , ). 1948 Shannon ( 2 (see Fig. ). Also in hydrolog- = 2011 1 4 , ), who generalized the 2 2012 is the length of the code , i log l Shannon 1956 − ( ), for more references on ap- Li et al. ; 2013 Beven et al. ( ; ), where new compression algorithms McMillan 2010 Jakeman and Hornberger , 2010 , 2013 ( 2032 2031 Ruddell et al. is the probability assigned to the event before ob- ); P Laio et al. ; Weijs et al. ) to all uniquely decodable codes. 1997 ( 2008 , , where 1949 ) for more background on information theory. See also ( P Mishra and Coulibaly ; Singh b log , ); 2006 Kraft − ( . In other words, one can see the analogy between prediction and i ciently as possible. When required to be uniquely decodable, short ) of 1987 ( 2010a 1 ffi ) for introduction and interpretations of information measures in the con- , b , Schoups et al. is the alphabet size (2 in the binary case) and ; 1 (1) A ≤ i 2010a 2002 l ( − , A This paper must be seen as a first exploration of the compression framework to In this paper, we take a perspective from (algorithmic) information theory, (A)IT, on There are, however, some issues in quantifying information content of data. Although i Alfonso et al. A). In contrast to probabilities,to which can integers. be For chosen example, freely, code the B code in lengths are the limited table uses one code of length 1, one of length able, there are only 4 prefix-free binary codes as short as X in which assigned to event data compression through thescarcity similarity of large between predictive the probabilities. Just scarcity as of there are short only codes 4 probabilities and of the allocated as e codes come at the costcodes of longer must codes be elsewhere. This prefixThis follows free, is from the i.e. formalized fact no by thatinequality code such the (Eq. can following theorem be of the first part (prefix) of another one. plications of information theoryof in information the content geosciences. as In description the length is following, elaborated. the2.1 interpretation Information theory: entropy and codeData length compression seeks to representin the a most file) likely with events the (mostwith shortest frequent dividing characters codes, high yielding probabilities, the also shortest short total code codes length. are As a is limited the resource case that has to be serving it. To not lengthen thisCover paper and too Thomas much, we referet the al. reader to and Rajagopal suprisal, defined as text of hydrological prediction and model calibration. We also refer the reader to ( that information content of a message, data point or observation, can be equated to regarding what to measure and where, to achieve optimal monitoring network designs From the framework of information theory, originated by are developed that employ hydrological knowledge to improve compression. 2 Information content, patterns, and compression of data ical model inference andmodel calibration, the complexity question is canet warranted be al. by asked, to the decide data how ( much define information content in hydrologicalthe time analogies series, and with showing the howbenchmark objective in they of a work introducing follow-up in study practice. The results will also serve as a the context of datathe compression. The question framework and naturally priorassessment shows knowledge is how enter possible specification using the tools of from probleminformation information and theory. The content to illustrative link what and between degreeinformation data an content, compression using objective common is compression elaborated algorithms. in practical explorations of the question seems straightforward,that the the answer question is isto not. not the This completely question is specified.answers that The partly depends answers one due on found to asks how inAn much the the data was objective fact data. are already assessment relative Moreover, known ofknowledge before the information is the content information explicitly answer specified. is content was therefore received. of only those possible whenquantifying prior information content in hydrological data. This puts information content in 5 5 25 20 15 10 25 20 10 15 Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | (2) ), if Cover 1948 ( 2, we find Cover and = ; A 1948 , Shannon ), using 0.4106 1 bits per sample, where , code A is optimal for 1 = ), yielding a total average element of the probability H ) ) q is the code length assigned || th Shannon man code for the distribution i i l p ff . The minimum average code (  ) was proven to be optimal for i KL D ). In Fig. is the /p 1 i is 1952 q q , 2006 ( log 6= p = man i ff l ); see Appendix A for an elaboration of this 2034 2033 is Hu 0.5,0.25,0.125,0.125 i ( || ) 2010b ( . In this equation, 1 1 ≤ i 1 Cover and Thomas man coding ( p 1 ). Although the expected code length is now more than the ff 1 = log 3 Weijs et al. i − p ). This happens for example in arithmetic coding, where the entire 2 · 1 0.4,0.05,0.35,0.2 n = 2 for which the code would be optimal and ( i X ( ) (4) + ), which is a lower bound for the average description length. q q 2006 = KL || 2 , } − D p l ( , mp3 and wma. For a good explanation of the workings of this algorithm, 2 , it is shown how the total code length can be reduced, assigning codes of { · = ff , (3) 2006 . The overhead in the case where 1 p KL i i man coding using blocks can get arbitrarily close to the entropy bound ( 1 , E ∀ II) ff D i || + l = 1 + 2 1 ) ) (III − p can be recognized as the entropy of the distribution ( = p 2 For probability distributions that do not coincide with integer ideal code lengths, the In Fig. ( ( KL · i fact that for dependentindividual values, the values. joint In entropy other is words, lower average than uncertainty the per sum value of decreases, entropies when of all like jpg, ti the reader is referred to distribution. When the seriesand are Hu long enough, entropyand coding Thomas methods like Shannon be used to achieve even better compression. This high compression results from the value by value compression.the It entropy-bound finds and codes is of applied an expected in average popular length compressed closest picture to and music formats D If the requirement thatis the relaxed, codes blocks are of value values by can value be (one grouped code together for to each approach observation) an ideal probability time series is coded as one single number. 2.2 Dependency If the values in a time series are not independent, however, the dependencies can probability distribution I andachieve code the B entropy is bound. Code optimal BIII for is the (last also distribution an column optimal II. in Hu Both Fig. these codes Leibler divergence from thewould true be distribution optimal. (III) to the distribution for which the code connection. algorithm known as Hu entropy, it is impossible to find a shorter code. The overhead is equal to the Kullback- quality presented in estimated probability mass function.the reliability This term is in extra the decomposition description of length an information-theoretical is score analogous for forecast to such as frequencies II inmass Fig. function to event code length of H bits per sample.bits In per general, sample if is a increased wrong by probability the Kullback-Leibler estimate divergence is from used, the the true number to of the number of bits, some overheadprobability will distribution occur. of The events rounded coding would be optimalq for a length is the expectation ofH this code length overThomas all events, H However, because in practice the code lengths often have to be rounded to an integer 1 varying length depending onevery occurrence value could frequency. be As represented shown withthe one by optimal code, allowing code for length non-integer code for lengths, an event 2 and two of length 3, we can verify that it sharply satisfies Eq. ( 5 5 25 15 10 20 20 15 10 Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | ) ) Q ), the ). The true value P ) for more 1937 ( 1968 log , − 2008 cient, any error , ) gave very intu- ffi Turing ). Similarly, we can 1968 ( Kolmogorov 2007 ; , on the right), in the second cient. On the other hand this 2 Li and Vitanyi ffi 1966 , ). In terms of data compression, Cilibrasi p Kolmogorov , but would also be needed for de- E ). Q erent characters used, but the entropy ff Chaitin ; 5.6, because of for example the relatively 2010b ( = 2036 2035 1964 , ff would be compressed together in one file, the gain Q Weijs et al. and Solomono p would help compressing E , p E ) and potential evaporation ( P P ered some refinements in the definitions of programs and showed a and ff P ) o 1975 ( ered is the dark-colored climatological histogram (Fig. Using the thesis that any computable sequence can be computed by a UTM and In general, better prediction, i.e. less surprise, gives better results in compression. Another example are the omitted characters that the careful reader may (not) have ff Algorithmic information theory (AIT) wasindependent founded as publications a ( field by the appearance of three learns from thetry relation between to these variablescompressible hydrological series ( time really series contain for to hydrological modeling. investigate how2.3 much information those Algorithmic information theory way than literally stating thosethat data is a defined “law” as of random. nature This is cannot analogous really to be the fact called a law if its statement is more elaborate compared to compressing the files individually is related to what a hydrological model background. Kolmogorov defined the complexity ofof a numbers) certain string as (i.e. the dataoutput set, length on series a of UTM the andit minimum then is halt. computer to Complexity describe. program ofa If that data there program is can are that thus produce clear related isdata that patterns to shorter cannot in how be complicated than the “compressed” data, the then in data they this itself. can way. Data The be that described majority cannot by of be conceivable described strings in of a shorter additive constant, which becomes relatively lessChaitin important when more data is available. theory looks at databasic through idea the is that lens information of contentest of algorithms way an to that object, describe like can a it. produceused, data Although AIT those set, description is uses data. length related the The to generally constructUniversal the depends of short- Turing on a Machine the universal (UTM), language computer to introduced show by that this dependence takes the formvery complete of analogy an withbetween Shannon’s conditional information entropy theory, and including conditional e.g. program lengths. the relations that program lengths aregram universal that up tells one to UTM an how additive to constant simulate another), (the length of the pro- compression. When English language is compressible and thereforeredundancy fairly leads ine to more robustnesstypographical in errors, the the communication, meaning is because stillwould even clear. with obfuscate If many the English meaning. were 100 % e In water resources managementdicting and one hydrology we series are of generallyfrom values concerned from precipitation other with ( series pre- ofknowledge of values, like predicting streamflow ( itive definitions of complexity and randomness; see also ( of the text is 4.3high bits, frequencies far of less the than spacemore log (16 (48) than %) and 4 the bits,readers, letter the because “e” the actual (13 structure %). uncertainty in Althoughture about the the can text the entropy is be is missing similar used letters to to English is predict language far and the less that missing struc- for characters. most On the one hand this means that case, the time seriescompression is available and (the expected left return ofshows the the for same value the figure). of bets Obviously,upon exploiting the are hearing dependencies e the better in pected true the in valueand data. is the 4.96 3.72 The second bits bits surprise in case, when ( divergence case which using scores the proposed the guessed in distribution climate was as assumed prior. These surprisesfound in are the equivalent previous paragraph. to There are the 48 di ries, that therefore contain information aboutshow themselves. Hydrological strong time internal series depen often encies, leadingConsider, to for better compression example, and the better case prediction. lengths) where to you possible are streamflow askedo values to on assign 12 probabilities May (or 1973. code In one case, the information the other values in the series are known, because we can recognize patterns in the se- 5 5 25 20 10 15 25 20 10 15 Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | ; cient 2011 model ffi , ff Schoups erent data ff Westerberg et al. ; er an upper bound estimation, 2011 ff , data. A short description of the data ff ): it is always possible that there exists a 1937 2038 2037 , . The amount of information that the rainfall con- ff Turing Beven and Westerberg ). In this paper, we consider only . 2011 , ciently flexible to describe any sequence, while still exploiting ffi process. For example, a series of rainfall values is highly compress- ff ). In fact, we are interested in the Kolmogorov complexity of the data, but 2008 , If the data can be regenerated perfectly from the compressed (colloquially referred The data compression perspective indicates that formulating a rainfall-runo A shortcut approximation to measuring information content and complexity, is to use archive). In the present exploratory example the inclusion of the algorithmic complexity introducesmainly some used small for various errors mediaoften in formats beyond (pictures; the our video; data. perceptive audio), capabilities.the where Lossy observed This these compression values is errors to is are analogous within measurement tofor precision, uncertainties a which in model could observation be that ( a generates way to account Roughly speaking, the file sizefor that the remains amount after of compression,decompression information gives algorithm in an should upper the be bound time counted towards series. this Actually, file also size the (cf. code-length a self-extracting of the this is incomputable. Aafter crude compression practical by approximation some commonly of availablean the compression upper algorithms. complexity bound This is provides for the the file information in size the data. to as zipped) files, the compression algorithm is said to be lossless. In contrast to this, Weijs and Van de Giesen the rainfall-runo ible due to the many zerosseasonality. These (a dependencies far are from uniform in distribution), thethe the rainfall relation autocorrelation, alone between and and rainfall the can and telltains runo us for nothing the about hydrological modelby the is number thus of less bitsbecause to than store the it rainfall number at determines of the theet desired data precision. model al. points This complexity multiplied amount that is important is warranted by the data ( the data, such as autocorrelation and periodicity. has an analogy withwill compressing contain a rainfall-runo good modellength about of it, the whose predictive model. power However, not outperforms all the description patterns found in the data should be attributed to section, often enhanced by methods that try to discover dependencies and patterns in sets to obtain ansion indication algorithms of use the entropy-based amount coding of methods such information as they introduced contain. in Most the compres- previous Although the algorithmicfractal complexity program with language is respect limited bysettings, to the the the size losslessly of compressed Turing the output complete fractalresolution. image program executable will executable and continue its to grow with increasing 3 Compression experiment set-up In this experiment, a number of compression algorithms is applied to di simulated by the decompressionis algorithm not on another Turing computer. complete,additional Since description it the length is for language some lessindefinitely recursive patterns with powerful is growing than replaced amounts by thecompress one of an original that data. ultra grows computer. As high The an resolution constant example, image one of can a fractal think generated of by trying a to simple program. positive integer not definable in under eleven words”. a language that ismost su of commonly foundlike a patterns. Turing complete While description this languagewithout can, approach having it the cannot will problems o discover of incomputability.that all Compressed use files patterns, a are such decompression alanguage. algorithm language, The to compressed recreate files the can object also be in seen its a original, programs less for a e computer, which is it is incomputable, butsolvability can of only the be halting approached problemshorter ( from program above. which This is is stillproduce running related the (possibly to output in and the an then un- infinitewere halt. loop) computable A that is paradox might the that eventually following would definition arise known if as Kolmogorov the complexity Berry paradox: “The smallest than the phenomenon that it explains. A problem with Kolmogorov complexity is that 5 5 25 15 20 10 25 20 10 15 Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | (5) (6) (7) erent signals. . This leads to ff x 6= 256 equal intervals erent quantizations (SNR). ff = 10 8 man coding. ff quantized 10log x = dB ), with a Markov-Chain model. LZ77 uses a 2040 2039 x cient to understand the most significant pattern ffi 1977 min , + man coding of the errors). x 2 ff erence in information content between di  ff x integer min x 2 ) ) − min ¯ x x x − − ,quantized t x t min x x ( max Ziv and Lempel − − 1 255 t x = n t x 255 ). Using this scheme, the series were split into 2 P 5 1 + 1 n max = n t ( RLE: HDF (hierarchical data format) is a data format for scientific data of any P 0.5 man Coding. = erent questions for which the information content of the answers is analyzed. c ff 1 n ff = = LZMA: The Lempel-Ziv-Markov chainrithm, algorithm LZ77 combines ( thesliding window Lempel-Ziv to algo- look for reoccurring sequences, which are coded with references JPG: The Joint Photographyincludes Experts a Group range created ofless the lossless coding JPEG and is , used, lossy which transform) which compression followed uses by techniques. a Hu Here Fourier-like type the of loss- transformHDF (Discrete cosine form, including pictures, time series andalgorithms, metadata. It including can run use several length compression occurring encoding data (RLE). with RLE the replaces value sequencesuseful and of to the re- number compress of pictureslong repetitions. dry with It would periods. large therefore uniform be surfaces andPPMD: rainfall A series variant of with Predictiongram. by It Partial uses Matching, implementedvalues a in using statistical a the variable 7Zip model sliding pro- Hu for window. Subsequently predicting the each errors are value coded from using the preceding ARJ: Uses LZ77 (see LZMA) with sliding windowWAVPACK: Is and a Hu lossless compression algorithm for audio files. – – – – – – integer quantized in the results. It is beyond the scope of this paper to describe the algorithms in detail. The algorithms thatprograms were and used formats. are Belowmain a are features selection very of of short eachdescriptions. of commonly descriptions The the of available descriptions algorithms are the compression used su main and principles some and references for more detailed 3.2 Compression algorithms be stored in 8 binary digits). x Because the SNRlogarithm, can which have is expressed a in large the unit range, decibel: it SNR is usually measured in the form of a Due to the limitedtimates amount of of the data, distributions, quantizationand which is compression. This are necessary is needed to analogous make toa to representative the meaningful calculate histogram. maximum es- the As number of amount willimply bins be of di permitted argued information to in draw the discussion,All di series werescheme first (Eq. quantized toand 8 converted into bit an precision, 8 bit using unsigned a integer (an simple integer ranging linear from quantization 0 to 255 that can These can be converted back to real numbersx using Because of the limitedrounding errors, precision which achievable can with bethe 8 quantified ratio of as bits the a , variance signal of to the noise original ratio signal (SNR). toSNR The the SNR variance is of the rounding errors. pose and not biasedstill towards needs hydrological data. to This bemainly means used stored to that in explore any the specific the di pattern compressed file.3.1 The compression Quantization algorithms will be of the decompression algorithm is not so relevant since the algorithm is general pur- 5 5 25 15 20 10 10 15 20 Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | ) P Q ), it 1979 , ciently small, ranged from ffi ), the model, and the stored . P 4.3 Rissanen and Langdon ) block sorting algorithm in combi- ) for a description of model and data. ) is an entropy-coding method which is 1994 ( 2009 1979 ( , 2041 2042 ) quantized discharge. This gives a signal that Q ). Martin 5 man coding. . ff man coding. Q Vrugt et al. ff ) for a description of this data set. As a reference, various 255, but because the errors are su + Burrows and Wheeler 2003 ( man-Coding. ff ect is deemed to be small and has a smaller influence than other 255 to ff ), the model could enable compression of the data set consisting of − ) from the observed ( err Q mod Q Vrugt et al. 128, which allows for 8 bit coding. Because the observed discharge signal ( + . In the experiment we test whether the error series is indeed more compressible data set : Uses the nation with Hu PNG: Portable Network Graphics (PNG)pixel from uses a the filter precedingby based pixels. the on Afterward, prediction the algorithm of prediction “Deflate”quences) one errors which followed are uses by compressed Hu dictionary coding (matching repeatingTIFF: se- A container imagethis format case that PackBits compression can was use used, several which compression is a algorithms. form In of run length encoding. to the previous location whererange the coding. sequence Range occurred. coding The ( method is followed by has less overhead than Hu mathematically equivalent to arithmetic coding ( Q – – – The hydrological model HYMOD was used to predict discharge from rainfall for the 55 to information content. Given thea connection hydrological between model modeling should and in data principle compression, be able to compress hydrological data. This of this compression experiment are presented in Sect. 3.4 Experiment B: Compression with aThe hydrological second model experiment isprevious a experiment first single time exploration series of were jointly compressed to compressing obtain time an series. indication of In their the streamflow and rainfall data in thetained 431 in river the basins MOPEX across the datacontent continental set. or USA, This complexity as should of con- give theare some time indication log-transformed series. about For before these the quantization, experiments, information the the to measurements. streamflow Missing reflect values values, the whichries. were heteroscedastic infrequent, Although uncertainty were this removed in can fromperiodicity, the have the se- some e impact onstrategies the such ability as to replacing exploit the autocorrelation missing and values by zero or a specific marker. Results useful for actual compression of hydrologicalleft data. for Although future a work, more we detailed performmodels analysis a is using first data test compression of tools. estimating the performance of hydrological Leaf River data set; see e.g. than the original time series of 3.5 Experiment C: Compression of hydrological time series from the MOPEX In a third experiment, we looked at the spatial distribution of compressibility for daily can be useful to identify good models in information-theoretical terms, but can also be Subsequently, the modeled dischargesscheme were as quantized the using observedmodeled the discharges. ( same An quantization errorcan signal range from was defined by− subtracting the can be reconstructed from theerror precipitation signal ( time series ( and artificially generated series where used. Thewhile generated the series time consist of series 50values). 000 of values, The the following Leaf series River wherewith data used the set, linear in contains scheme this 14 using 610 experiment. Eq. All values ( are (40 yr quantized of directly daily 3.3 Experiment A: comparison on generatedIn and the hydrological first time experiment, series thefrom Leaf algorithms River are (MS, tested USA)See on consisting a e.g. of real rainfall, potential world evaporation hydrological and data streamflow. set 5 5 25 20 25 15 20 10 15 10 Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | ) mod man ff Q ( Q erence in ff man and Range coding). Note that ff 2044 2043 for leaf river, along with the modeled P ). and , the entropies of the signals are shown. The second Q 3 2010a ( give an indication of the entropy bound for the ratio of com- 2 ). In Table err Weijs et al. Q ). Further structure like autocorrelation and seasonality can not be used 4 ) and ( 3 ciently to compensate for this overhead. In contrast to this, the streamflow series ffi The results for the hydrological series firstly show that the streamflow series is better The results for the various lossless compression algorithms are shown in rows 7–17. The signal to noise ratios in row 4 give an indication of the amount of data corruption overhead caused by the occurrenceEqs. of ( 0 rainfall more than 50 percent of the time, see compression algorithm for each series (PPMD or LZMA). compressible than theseries precipitation has series. the lower This entropy. Furthermoreentropy-bound is it is can remarkable, not be because seen achieved that by the for any the rainfall rainfall of series, the the algorithms, presumably becausesu of the can be compressed tothe well strong below autocorrelation the in entropyPPMD the bound algorithm, data. (27.7 % which These vs. uses dependenciescorrelated 42.1 a %), are values local because best prediction quite of exploited model accurately. byindicating that Many the that apparently they of use can the at predict algorithms least the part cross of the4.2 the temporal entropy dependencies bound, in Results the B: data. Compression with aWe hydrological analyzed model the time seriesand of its errors ( row shows the resulting file size as percentage of the original file size for the best the compression of this non-uniform whiteuncertainty noise or signal information is equivalent gain tovalues due the (the to di climate), knowledge compared of tobars the a in occurrence naive Fig. frequencies uniform 1 of probability of estimate; all cf. the first two also completely random inthe time, theoretical but limit does foralgorithm not compression gets have closest is a to the the(ARJ, uniform entropy theoretical PPMD, distribution. bound limit, LZMA Therefore of but BZIP2) alsouse 86.3 approach %. several a that file The form limit archiving WAVPACK of algorithms very entropy closely. coding This as is a because back-end they (Hu all small file header (meta information about the file format). The Gaussian white noise is appears to know a cleverthe way WAV format, around which this. does not In even fact, attempt the to compress smallest the file data size and is has a achieved relatively by they have a fixed overhead that is relatively smaller forThe larger numbers files. are the percentageinal of file the size file (a size lower afterratios percentage compression, per indicates relative better time to compression). the series Thestant, orig- are best linear and compression highlighted. periodic From signalsachieve can the be this result compressed high it to compression, a becomes largeuniform although clear extent. white some Most noise that have algorithms is the more theoretically overhead con- incompressible, than and others. indeed The none of the algorithms that is caused by(Bitmap), the WAV (Waveform quantization. audio file Asincluded, format), a indicating and that reference, HDF the the (Hierarchical filedepend uncompressed Data size of on Format) formats those are what BMP formats, data relative to are the in raw them, data, does but not does depend on the amount of data, because 4.1 Results A: generated data As expected, the fileues sizes in after the quantization series,raw are as format. exactly each From equal value the is to occurrencetheir frequencies encoded the distribution of by number was the one of calculated. 256 bytefractions val- Normalized unique in (8 values, with row bits) the 3 the and of entropy maximum Table pression stored of entropy in achievable of binary by 8 valuecoding, bits, by which the do value not entropy use encoding temporal dependencies. schemes such as Hu This section shows resultsan from example the of compression analysis compressionwith for of knowledge single of discharge, time rainfall, using is series. a shown. Also hydrological model in combination 4 Results of the compression experiments 5 5 15 10 25 20 25 15 20 10 Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | , . ) ), ), 6 5 Q 1 − and t cient Q P | 2007 ffi t , Q ( H ) and rainfall Q cult to compress ffi erence between the ) bits per time step. sets the gain in com- ff Q ff probabilities estimated ( 0 2 with knowledge of H Q Schaefli and Gupta ) for compression is left for 0.89), o should reduce uncertainty in = 2009 , ρ P erent algorithms are specialized in ff ) is 0.18 (see 1 − t Q 2046 2045 = ). As expected this does not change the entropy, ) ected, indicating that the compression algorithms t ff ( perm err mod Q ), is much lower than the entropy itself. This could the- Q Q ( 0 and H cient way of coding and consequently not the most e Pianosi and Soncini-Sessa ffi perm ciency (NSE) of the model over the mean is 0.82, while the NSE , where the compression ratio of the best performing algorithm is Q 4 ffi ). It must be noted, however, that this conditional entropy is probably e E ff mod erent kinds of patterns, so the map of best performing algorithms (Fig. Q ff | Q 0.60), compared to that of the discharge ( ( ) is compensated by the fact that the dependency must be stored by the = H ρ mod . Q 3 | Q ), for brevity, the compressibility and entropy show clear spatial patterns. For most of Compression beyond the entropy bound can be achieved by using temporal patterns. A somewhat unexpected result is that the errors seem more di The table also shows the statistics for the series where the order of the values was ( P This is visible in Fig. information content of data. visualized relative to the entropydescribing of di the signals. Di some of two influences on compressiondencies rate in are the shown. streamflow, Firstly, the dueknown conditional to entropy as temporal given depen- the the entropy previous rate oretically value lead to aHowever, because compression of describing the thecomplexity relatively of signal the short with model lengthcompression that of is describes the not it reached (a timeThis two in is series dimensional practice, a compared 256 because natural to bin way the the histogram), of model this accounting needs for to model be complexity stored in too. the context of estimating Fig. can be used as an indication for which types of patterns are found in data. In Fig. For the time series of theof quantized the scaled log MOPEX streamflow basins, and( scaled from quantized now rainfall on simplythe streamflow referred time to series, as thedistribution streamflow entropy of ( is close the todistribution. preprocessed 8 An bits, streamflow exception indicating does are thatlower not the the entropy frequency time diverge basins series in much dueAlso to the from for high the central a peaks rainfall, part entropy and uniform values relatively of are long, lower the low in base USA, this flow region which periods. due show to longer dry spells; see future work. 4.3 Results C: MOPEX data set forecasting models (e.g. tool for probabilistic prediction. The use of for example heteroscedastic probabilistic conditional entropy and themodel entropy is of not the the errors most could e indicate that an additive error ficient pression due to thethe lower errors entropy becomes of to the complex errors.research to is Possibly, be the needed detected to temporal by determine dependence the thetent in exact compression with cause algorithms. the of Further this theoretical result, ideaThe which Nash-Sutcli that should the be consis- information in over the persistence forecast ( indicating a reasonable model performance. Furthermore, the di conditional entropy gives a theoreticalthe limit model, of while compressing not making use of temporal dependence. (31.5 %) than the observedApparently discharge the itself reduced (27.7 %), temporal even dependence though in the the entropy errors is (lag-1 lower. autocorrelation coef- entropy underestimated, as it isfrom based 14 on 610 a value joint pairs.to distribution This a with is 255 specific the functionalthan cost form. Pearson of The correlation, estimating estimation becauseat dependency of variance without the mutual rather limiting latter information it thanH is needs uncertainty. limited more In to data theentire a description joint linear length, distribution. the setting If underestimation and representative of looks for the dependence in longer data sets, the randomly permuted ( because that depends only onibility the of histograms the of signals the is series.made significantly In use a of contrast, the the temporal compress- dependencetion of for the the modeled non-permuted and signals. observed The discharges joint was distribu- also used to calculate the conditional 5 5 15 25 20 10 25 15 20 10 Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | ) or function libraries (e.g. ff ). 2048 2047 1968 , Kolmogorov sets the space needed to store the model describing the de- ff Flexible data compression algorithms, such as used in this paper, are able to give Summarizing, we can state that information content of data depends on (1) what In the second case, temporal dependencies reduce the average information con- In many situations in practice, however, prior knowledge does not include knowl- The information content of time series depends also what prior knowledge one has The quantization can be linked to what question the requested information answers. seeing the data. an upper boundnot for specifically the tuned informationare towards content stored hydrological of in data. hydrologicalTheoretically, the All data, prior compressed patterns information because can file inferred they bethe and from are form explicitly of very the fed auxiliary to little data data newhydrological files is (e.g. compression models), rainfall considered algorithms which to in should compress as runo increase reduce prior in information prior information. content knowledge. of the data due toquestion the we ask the data, and (2) how much is already known about the answer before pendencies. In the theoretical frameworkdata of are algorithmic unified information in theory,length one model of algorithm and the (one shortest could algorithmKolmogorov see Complexity that ( as reproduces the a data self-extracting is archive) the and information the content, or in compression, optimal codingstored table, and which adds depends tomodel on the that is the file inferred frequencies, size. fromcontent. One should the data. could be The see model the generally forms histogram part as of a thetent simple information per form observation. of Also a a when the priori, form but of inferredgain the from in temporal the dependencies compression data, are o they not can know decrease the information content, if the servation needed to store a long i.i.d. time seriesedge of that of distribution. the occurrencecies frequencies, alone, e.g. or temporal does dependencies.data In include should the more first also knowledge case include the than the information frequen- knowledge content gained of the from observing the frequencies. Also distribution. The entropy in bits gives the limit of the minimum average space per ob- pected information content of each observation is given by the entropy of the frequency fore also the questions change.lower The flows question than requests more on absoluteby the precision the higher on data, flows. the The i.e.that information the is contained asked. information in content the of answers the given time series,about depends the on answers to thehas the question question no asked. knowledge Ifprobability of one distribution knows surrounding that the values, frequency matches the distribution the but prior observed knowledge frequencies. takes In the that form case, the of ex- a tization, auxiliary data, and prior knowledge used. When quantizing streamflow intoplicitly posed 256 is: equally “in sized whichWhen of classes, the these logarithm the equally of question spaced the intervals that streamflow does is is the used im- instead, streamflow the fall?”. intervals change, and there- is also important todiscuss some first inherent consider issues the inresults quantifying subjective limitations and the such not information approaches. straightforward content, to In which analyze. makes this the paper,5.1 we How much information is containedFrom in the this presented data? theoretical background,that results, although and analysis information itpends theory can on can be a concluded number quantify of information subjective choices. content, These the subjective choices outcome include de- the quan- The data compression results giveity an indication of of the the information data.and content become or Eventually, a complex- these tool may formation be hydrological theory time linked may series to eventually analysis climate provide and a and inference. Although solid basin infor- foundation characteristics for hydrological modeling, it 5 Discussion 5 5 25 15 20 10 25 15 20 10 Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | (8) ect to a large ff erent prior knowledge, however, ff | ) Q , P 2050 2049 ( | − | ) Q as a streamflow time series. Our prior hydrological knowl- ZIP( | π + | ) P ZIP( | stands for the file size of a theoretically optimal compression of data | = ) X ZIP( | model could theoretically be estimated by comparing the file size of compressed ff as a probable source of the data). There would be little surprise in the second half π ers a shorter description for that specific pattern. Further explorations of these ideas from algorithmic information theory are expected A hydrological model actually is such a compression tool. It makes use of the de- As a somewhat extreme, unlikely, but illustrative example, consider that we encounter However, the inherent problem in answering this question is the subjectivity of what , which includes the size of the decompression algorithm. This brings us back to ff to put often-discussed issueswith in more general hydrological and model robust foundations. inference in a wider perspective where X the ideas ofduce algorithmic data information on computers theory, (Turing which machines).merging The input uses shortening and program in description output lengthsamount length data, of when that information i.e. repro- learned the bydecompression compression algorithm modeling. progress, embodies The could the hydrological knowledge model be gained that seen from is the as part data. the of the rainfall plus the file sizeand of streamflow compressed streamflow are with compressedcould the denote together, size this exploiting of as: a their file mutual where dependencies. rainfall We learnable info data is asked to givequestion can information than about. be Upper found bounds using for compression information algorithms content on forpendencies the that between quantized for data. example rainfall andpresent streamflow. The in patterns the that arefrom: rainfall already a reduce long the dryexponential information recession period that curve could the in forruno the hydrological example streamflow. model The be can information summarized learn available by for one a rainfall parameter for an Determining information content of datadata is or a similar compressing process thetherefore as data. building this a These model knowledge processes ofcontent. should the are Quantization be subject of explicitly to the considered prior data knowledge in can and determining be information seen as a formulation of the question the 6 Conclusions edge would indicate those values astion random, and (no containing internal a dependence large or amountfor predictability). of informa- With example di that the datawe would is consider the the output data asor of having compress a a the computer pattern, and data program couldof authored (by use inferring by this a one to make student, of predictions of the the possible data, programs given the the enumerate first. digits prior knowledge favoring certainthe long data, or (so prior otherwise knowledgeo unlikely) favoring a programs certain that reference describe computer or language, which 100 consecutive digits of spaces, to see how much of the variability can potentiallywe be call explained pattern by and anycan model. what give we guidelines call on scatter.usually how Although do much model not pattern complexity account can control fordegree prior be methods what knowledge. is reasonably This considered inferred prior a fromis knowledge pattern, data, used may for they a to example by searchthe constraining for algorithmic the information model patterns theory class or that sense, by this introducing can be knowledge equivalently of expressed underlying either as physics. In In current hydrological literature, attempts(due are to sometimes incomplete made knowledge toness separate of in epistemic the the process) system)to from uncertainty. trying aleatoric The to (the approach separate “inherent” pattern to random- from answer scatter this (signal question from is noise) equivalent in high dimensional data 5.2 Aleatoric and epistemic uncertainty 5 5 25 15 20 10 25 15 20 10 Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | ), N 2010b , ( k f is given and k , when observations Weijs et al. ) ¯ o 2.1 RES ciency or predictive power , given forecast k ffi − ¯ o /N REL ), applied to a predictions of the + b , (model) UNC L 2010a + = (   ). As noted in Sect. ) ) /N ¯ ¯ o o 2052 2051 || || k k ¯ ¯ o o ( ( 2010b ( KL KL D D Weijs et al. REL. − − ) ) + k k f f COMPLEXITY || || ¯ ¯ o o + ( ( UNC Weijs et al. is not equal to climatological distribution KL KL = f ) D for an unconditioned, constant forecast (code for compression). D RES f   ¯ || , but an optimal fixed coding is chosen assuming the distribution is o k − ¯ k o p N N ), n ( n = ¯ ). o 1 1 k KL || q = K = K ¯ || k o REL k X k X D ¯ o p + + ( ( + + to the conditional distributions of observations ) is the number of observations for which unique forecast no. ) ) ¯ ¯ KL (model) is the length of the model algorithm. Although this model length is KL ¯ ¯ o o o o k ( ( ( D D L n UNC H H H + = ) = = = = p The resolution term, given by the Kullback-Leibler divergence from the marginal dis- When data with temporal dependencies is compressed, a lower average code length The code length is related to the remaining uncertainty, i.e. the missing information, ( , the expected average code length per observation is given by where language dependent, itconstant, is and known can from bethe AIT interpreted language as that is the this not priorlength dependence specifically will knowledge geared give is encoded a towards in just fairly awhich objective the an cannot certain estimate language. be of additive type If explained the of from amountto the data, of data store new the itself. information total data The in number code divergence the can of data, score therefore bits presented per be sample in interpreted needed as a complexity-penalized version of the DS DS where is the total numbermodel of that observations. describes When the compressing temporalthe data, dependence average however, needs total the to code prediction be length stored per as data-point will well. Therefore, become which is equivalent totime the series mutual and information the between value toSince the code, forecast also will based reduce the on the individual averagelength the code per forecasts past length observation per will observation. will not nowtion be of have a the completely divergence contribution reliable, score from the each term average in code the decomposi- data can only be compressedbe if an there algorithm is where a the pattern, resolution or i.e. gain something in that description can e be described that the individual probability estimates now have non-zero resolution. This resolution, data based on previous time steps. We can make the following observations. Firstly, per observation can be achieved,next since observations, we depending can on use the a previous. dynamically In changing terms coding for of forecast quality, this means DS tribution RES is zero since terms of forecast evaluation andusing the the same decomposition notation, presented this in with remaining uncertainty a is forecast the with divergence zero(forecast score distribution resolution associated (forecasts do not change), and non-zero reliability gence as a forecast skill scoreolution, and its as decomposition proposed into in uncertainty, reliabilityhave and distribution res- q H i.e. the amount of information that remains to be specified to reproduce the data. In Correspondence of resolution – reliabilitycompression – and uncertainty structure decomposition to In this appendix, we give a data-compression interpretation of Kullback-Leibler diver- Appendix A 5 5 20 15 10 20 15 10 Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | , ff , 2011. doi:10.5194/hess-14- doi:10.1002/hyp.7963 doi:10.5194/hess-15-3123-2011 2041 2041 2031 2031 2054 2053 , 2012. 2031 2031 2036 2048 2046 , 2036 , 2009. , 2010a. , 2010b. 2031 2034 2034 Steven Weijs is a beneficiary of a postdoctoral fellowship from the AXA 2031 , 2032 2036 2036 2033 , , 2010. doi:10.1029/2011WR011251 2045 2031 2036 2032 man, D. A.: A Method for the Construction of Minimum-Redundancy Codes, P. IRE, 40, ff 2007. 1909-2010 uation and design:W05521, Maximum information minimum redundancy, Water Resour. Res., 48, Quantify Process Uncertainty, Feedback, Scale,2032 Eos T. Am. Geophys. Un., 94, p. 56, 2013. ter’s thesis, Massachusetts Institute2032 of Technology, Dept. of Electrical Engineering,for 1949. goodness-of-fit testing, Hydrol. Earth Syst. Sci., 14, 1909–1917, Springer-Verlag New York Inc, 2008. 115–116, 1956. drol., 380, 420–437, 2010. reservoir withdoi:10.1029/2008WR007335 a heteroscedastic inflow2041 model, Water Resour. Res., 45, W10430, sage, in: Video & Data Recording conference, 1979. put. Math., 2, 157–168, 1968. 2011. 2007. 1098–1101, 1952. model?, Water Resour. Res., 29, 2637–2649, 1993. might do about it), Hydrol. Earth Syst. Sci., 15, 3123–3133, rep., Systems Research Center, Palo Alto, CA, 1994. 329–340, 1975. 2006. tion ofdoi:10.1029/2009WR008101 monitoring water level gauges in polders,doi:10.1029/2009WR008953 Water Resour. Res., 46,tion W03528, in hydrological inference, Hydrol. Process.,2038 25, 1676–1680, 547–569, 1966. work in polder systems using information theory, Water Resour. Res., 46, W12553, Schaefli, B. and Gupta, H. V.: Do Nash values have value?, Hydrol. Process., 21, 2075–2080, Li, C., Singh, V., and Mishra, A.: Entropy theory-based criterion for hydrometric network eval- Ruddell, B. L., Brunsell, N. A., and Stoy, P.: Applying Information Theory in the Geosciences to Kraft, L. G.: A device for quantizing, grouping, and coding amplitude-modulated pulses,Laio, F., Mas- Allamano, P., and Claps, P.: Exploiting the information content of hydrological “outliers” Li, M. and Vitanyi, P. M. B.: An introduction to Kolmogorov complexity and its applications, Mishra, A. and Coulibaly, P.: Hydrometric network evaluation forPianosi, Canadian watersheds, J. F. Hy- and Soncini-Sessa, R.: Real-time management of a multipurpose water Martin, G. N. N.: Range encoding: an algorithm for removingMcMillan, redundancy B.: from a Two digitised inequalities mes- implied by unique decipherability, IEEE T. Inform. Theory, 2, Rissanen, J. and Langdon, G. G.: Arithmetic coding, IBM J. Res. Dev., 23, 149–162, 1979. Kolmogorov, A. N.: Three approaches to the quantitative definition of information, Int. J. Com- Jakeman, A. J. and Hornberger, G. M.: How much complexity is warranted in a rainfall-runo Cilibrasi, R.: Statistical inference through data compression, Ph.D. thesis, UvA, Amsterdam, Hu Burrows, M. and Wheeler, D. J.: A block-sorting lossless data compression algorithm, Tech. Cover, T. M. and Thomas, J. A.: Elements of information theory, Wiley-Interscience, New York, Beven, K., Smith, P. J., and Wood, A.: On the colour and spin of epistemic error (and what we Chaitin, G. J.: A theory of program size formally identical to information theory, J. ACM, 22, Chaitin, G. J.: On the length of programs for computing finite binary sequences, J. ACM, 13, Alfonso, L., Lobbrecht, A., and Price, R.: Information theory–based approach for loca- Beven, K. and Westerberg, I.: On red herrings and real herrings: disinformation and informa- Acknowledgements. research fund, which is gratefullythe acknowledged. NCCR-MICS Funding and from CCES the are Swiss also Science gratefully Foundation, acknowledged. References Alfonso, L., Lobbrecht, A., and Price, R.: Optimization of water level monitoring net- leads to the notionpredictive performance that of we models. have to penalize model complexity when evaluating the outweighs the loss due to complexity. Secondly, the data compression view naturally 5 5 30 15 10 20 25 20 25 15 10 Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | , 2008. 2031 2041 2032 doi:10.5194/hess- , 2003. ed Complex Evolution ffl 2038 2052 , doi:10.1029/2008WR006836 , 2011. 2038 2051 , 2031 2035 , 2011. , 2037 , 2055 2056 doi:10.1029/2002WR001642 2034 , , 2002. 2036 2042 2052 , 2032 2044 2040 , doi:10.1002/hyp.7848 doi:10.1175/2011MWR3573.1 2032 2033 , , 2010a. 2032 2032 , doi:10.1029/2001WR001118 , R. J.: A formal theory of inductive inference. Part I, Inform. Control, 7, 1–22, 1964. ff 2031 2038 , can be used to improve compression of hydrological data, Entropy, in review, 2013. Review, 138, 3387–3399, 2010b. uated using information theory,14-2545-2010 Hydrol. Earth Syst. Sci., 14, 2545–2558, cast skill score with classic reliability-resolution-uncertainty decomposition, Monthly Weather Metropolis algorithm for optimizationrameters, Water and Resour. uncertainty Res., 39, assessment 1201, of hydrologic model pa- tainty derived with aProcess., non-stationary 25, 603–613, rating curve in the Choluteca River,Theory, Honduras, 23, 337–343, Hydrol. 1977. maximum entropy (POME) in hydrology, IAHS-AISH P., 194, 353–364, 1987. 2036 Lond. Math. Soc., 2, 230–265, 1937. 38, 1312, 1948. 626, 1997. logic model parameters: The information content of experimental data, Water Resour. Res, cation: an information-theoretical view onRev., forecasts, 139, observations 2156–2162, and truth, Month. Weather 2031 drologic prediction, Water Resour. Res., 44, W00B03, (DREAM) and informal (GLUE)Res. Bayesian Risk approaches A., in 23, hydrologic 1011–1026, 2009. modeling?, Stoch. Env. Weijs, S. V., Van de Giesen, N., and Parlange, M. B.: HydroZIP: how hydrological knowledge Weijs, S. V., Schoups, G., and van de Giesen, N.: Why hydrological predictions should beWeijs, eval- S. V., Van Nooijen, R., and Van de Giesen, N.: Kullback–Leibler divergence as a fore- Vrugt, J. A., Gupta, H. V., Bouten, W., and Sorooshian, S.: A Shu Ziv, J. and Lempel, A.: A universal algorithm for sequential data compression, IEEE T. Inform. Westerberg, I., Guerrero, J., Seibert, J., Beven, K., and Halldin, S.: Stage-discharge uncer- Singh, V. P. and Rajagopal, A. K.: Some recent advances in applicationTuring, of A. the M.: principle On of computable numbers, with an application to the Entscheidungsproblem, P. Singh, V. P.: The use of entropy in hydrology and water resources, Hydrol. Process., 11, 587– Solomono Vrugt, J. A., Bouten, W., Gupta, H., and Sorooshian, S.: Toward improved identifiability of hydro- Weijs, S. V. and Van de Giesen, N.: Accounting for observational uncertainty in forecast verifi- Shannon, C. E.: A mathematical theory of communication, Bell. Syst. Tech. J., 27, 379–423, Vrugt, J. A., Ter Braak, C. J. F., Gupta, H. V., and Robinson, B. A.: Equifinality of formal Schoups, G., van de Giesen, N. C., and Savenije, H. H. G.: Model complexity control for hy- 5 5 30 25 15 10 20 Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | P 36.4 Leaf Q 27.7 0.8 40.2 50.0 1.5 87.5 2.9 25.6 38.0 66.2 function “rand”: uniform white noise function “randn”, normally distributed white noise ® ® 2058 2057 100.1 100.1 100.1 100.1 100.3 100.3 Uncompressed formats Lossless compression algorithms 0.8 100.4 93.5 daily rainfall series from thecorresponding catchment daily of series Leaf of river observed (1948–1988) streamflow in Leaf river 0.0 99.9 99.90.2 86.3 1.9 96.0 92.7 103.0 42.1 31.0 P Q The performance, as percentage of the original file size, of well known compression Signals used in experiment A. constantlinearuniform white containsGaussian only white 1 value is repeatedly thesine output 1 is from the the output Matlab fromsine contains the 100 a Matlab slowly linearly increasingLeaf trend Leaf single 100 sinusoidal sinusoidal wave waves with with a a wavelength wavelength spanning spanning all 1/100 50 of 000 50 values 000 values Signal Description NONE 100.7RLE 100.7 2.3 100.7 2.7 100.7 101.5 100.7 100.7 102.3 102.3 101.5 3.2 92.3 202.3 202.3 LS 12.6 12.8 110.6 94.7 12.9 33.3 33.7 49.9 N H log LZMABZIP2PNG 0.4 0.3 0.9 0.3 1.8 101.6 100.7 88.1 90.7 1.9 3.0 1.2 2.3 31.0 29.8 37.8 40.5 HDF BMPWAVJPG 102.2 100.1 102.2 100.1 ARJPPMD 102.2GIFTIFF 0.3 0.3 102.2 1.0 2.1 102.2 102.2 2.3 2.0 407.4 100.3 102.4 15.7 407.4 2.4 138.9 88.0 89.7 101.2 3.1 3.6 124.5 1.9 1.4 101.2 17.3 33.7 2.9 32.0 40.0 91.2 38.8 201.5 45.9 201.5 file sizeSNR 50 000 50 000 NaN 50 000 255.0HDF WAVPACK 255.6 50 000 50 000 50 000 14 610 108.0 14 610 307.4 317.8 42.6 39.9 Data set Constant Linear Uniform white Gaussian white Sine 1 Sin 100 Leaf Table 2. algorithms on various time series. The best results per signal are highlighted. Table 1. Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | 111 0.125 11 perm err 0.01 110 0.125 Q 1 < 10 0.25 0 0.5 perm 0.01 Q 4106 < . mod Q | 125)) = 0 . Q 0 , err 125 . Q 0 , 25 . 0 , mod 5 . 6 (0 2060 2059 || 2) . 0 2 2.25 2 1.75 2 2.15 , 0.50.5 0.250.5 0.5 10.5 0.75 0.5 0.75 0.25 0.5 0.25 0.375 0.5 0.375 0.8 0.7 0.1 0.4 1.05 0.4 0.1 0.6 A_I B_I A_II B_II A_III B_III 35 . PQQ 0 , 0.15 0.89 0.95 0.60 N.A. 05 . AB 0001 0 10 10 110 11 111 0 , 4 . 256) 11.7 11.6 10.4 4.95 N.A. 11.6 4.95 modeling. ((0 III 0.4 0.2 0.05 0.35 = ρ ff H=1.74 KL D ) = II || I II occurrence frequencies codes expected code lengths per value Assigning code lengths proportional to minus the log of their probabilities leads to optimal 0.250.25 0.5 0.25 0.25 0.25 0.125 0.125 Information-theoretical and variance statistics and compression results (remaining file III ( statistic entropy (% of 8 bits)best compression (%)std. dev. (range 36.4 31.0 27.7 42.1 25.8 44.9 31.5 38.9 N.A. 26.4 45.4 42.1 44.1 38.9 Autocorrelation Assigning code lengths proportional to minus the log of their probabilities leads to opti- YY GG RR CC KL total H=2 H=1.75 event YYCCCCRRCCGGYYCC 0100001000110100 10001110110100 CODE A: 16 bits, 2/color CODE B: 14 bits, 1.75/color If the requirement that theblocks codes of are values value by can value bethe (one grouped series code together are for to each long approach observation)blocks enough, is an can entropy relaxed, get ideal coding arbitrarily probability methods close distribution.for like to example the When Shannon in entropy and arithmetic bound coding, Huffman (Cover where coding and the Thomas, using entire 2006). time series This is happens coded as one single number. Fig. 1. compression. Code B ishas optimal no optimal for code distribution that II, achieves but the entropy not bound. for the otherdistribution distributions. II. Both Distribution these III codes achievecode the entropy for bound. the Code B distributionnow is III also more an (last than optimal column Huffman the inthe entropy, Kullback-Leibler figure it divergence from 1). is the impossible truecode Although distribution to would (III) the find be to optimal. expected a the distribution code shorter for length code. which the is D The overhead is equal to Fig. 1. mal compression. Code B isbution III optimal has for no distribution optimal II, code but that not achieves for the the entropy other bound. distributions. Distri- Table 3. size %) for rainfall-runo 5 10 Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | 0.3 Answer: 86.42 0.2 probability 3 2 4 6 5 4 7 0.1 2.5 1.5 3.5 7.5 6.5 5.5 4.5 0 01−Jun−1973 2062 2061 rainfall H log(streamflow) H date 15−May−1973 01−May−1973 0 15−Apr−1973 50

400 350 300 250 200 150 100

/s) streamflow (m streamflow 3 Spatial distribution of entropy for quantized streamflow and rainfall shows the drier cli- The missing value in the flow time series can be guessed from the surrounding values mate in the central part of the USA. Fig. 3. Fig. 2. (a guess would for example beone the purely grey based histogram). on This the willalone. occurrence usually The frequencies lead missing over to value the therefore a whole contains better 40 yr less guess data information than than set when (dark assumed histogram) independent. Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | . erent basins. ff 6

1 PPMD LZMA lossless 0.9 0.8 0.7 0.6 0.5 1.1 1.05 1.15 JPG PPMD LZMA BZIP2 PNG 365 WAVPACK 2064 2063 Best algorithm P Best algorithm Q temporal compressibility of P (smallest filesize/entropy limit) temporal compressibility of P (smallest temporal compressibility of Q (smallest filesize/entropy limit) of Q (smallest filesize/entropy temporal compressibility

Spatial distribution of the best performing algorithms for streamflow and rainfall. This Spatial distribution the compression size normalized by entropy for streamflow and rain- Fig. 5. can give an indication whatperforming type algorithm of is structure is linked found to in the the number data. of Especially dry for days rainfall, per the year. best See also Fig. Fig. 4. fall, this gives an indication of the amount of temporal structure found in the di The streamflow is better compressible due the strong autocorrelation structure. Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper |

1 PPMD LZMA 0.9 0.8 0.7 0.6 percentage dry days 0.5 depends strongly on the percent- P 0.4

0.6 0.5 0.4 0.3 0.2

0.35 0.25 0.65 0.55 0.45 best compression percentage for P for percentage compression best

8 2065 7 6 against entropy and against entropy rate. Temporal depen- Q 5 4 3 compressed size (bits per sample) 2 entropy entropy rate 1:1 line 1

0

Left: best compression of 7 6 5 4 3 2 1 0 8 entropy (rate) entropy dencies cause better compressionthe than entropy the rate. entropy, Right: but theage model best of complexity achieved dry prevents compression days, achieving of the mostly climate. through the entropy. Also the best performing algorithm changes with Fig. 6.