Guessing Revisited: a Large Deviations Approach Manjesh Kumar Hanawal and Rajesh Sundaresan, Senior Member, IEEE

70 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 1, JANUARY 2011 Guessing Revisited: A Large Deviations Approach Manjesh Kumar Hanawal and Rajesh Sundaresan, Senior Member, IEEE Abstract—The problem of guessing a random string is revisited. to be one such optimal guessing order1. implies A close relation between guessing and compression is first estab- that is the th guess. Arikan [3] considered the growth of lished. Then it is shown that if the sequence of distributions of the as a function of for an independent and identi- information spectrum satisfies the large deviation property with a certain rate function, then the limiting guessing exponent exists cally distributed (i.i.d.) source with marginal pmf and . and is a scalar multiple of the Legendre-Fenchel dual of the rate He showed that the growth is exponential in ; the limiting ex- function. Other sufficient conditions related to certain continuity ponent properties of the information spectrum are briefly discussed. This approach highlights the importance of the information spectrum in determining the limiting guessing exponent. All known prior re- (1) sults are then re-derived as example applications of our unifying approach. exists and equals with , where Index Terms—Guessing, information spectrum, large deviations, is the Rényi entropy of order for the pmf , given by length function, source coding. (2) I. INTRODUCTION Malone and Sullivan [4] showed that the limiting exponent ET denote letters of a process of an irreducible Markov chain exists and equals the log- L where each letter is drawn from a finite set with joint arithm of the Perron-Frobenius eigenvalue of a matrix formed probability mass function (pmf) . Let by raising each element of the transition probability matrix to be a realization and suppose that we wish to guess this real- the power . From their proof, one obtains the more general ization by asking questions of the form “Is ?”, step- result that the limiting exponent exists for any source if the ping through the elements of until the answer is “Yes.” We Rényi entropy rate of order wish to do this using the minimum expected number of guesses. There are several applications that motivate this problem. Con- (3) sider cipher systems employed in digital television or DVDs to block unauthorized access to special features. The ciphers used exists for . Pfister and Sullivan [5] showed the are amenable to such exhaustive guessing attacks and it is of in- existence of (1) for a class of stationary probability measures, terest to quantify the effort needed by an attacker (Merhav and beyond Markov measures, that are supported on proper sub- Arikan [1]). shifts of [5]. A particular example is that of shifts generated Massey [2] observed that the expected number of guesses is by finite-state machines. For such a class, they showed that the minimized by guessing in the decreasing order of -probabil- guessing exponent has a variational characterization [see (25) ities. Define the guessing function later]. For unifilar sources Sundaresan [6] obtained a simplifica- tion of this variational characterization using a direct approach and the method of types. Merhav and Arikan remark that their proof in [7] for the limiting guessing exponent is equally applicable to finding the lim- Manuscript received December 15, 2008; revised June 24, 2010; accepted July 18, 2010. Date of current version December 27, 2010. This work was sup- iting exponent of the moment generating function of compres- ported by the Defence Research and Development Organisation, Ministry of sion lengths. Moreover, the two exponents are the same. The Defence, Government of India, under the DRDO-IISc Programme on Advanced latter is a problem studied by Campbell [8]. Research in Mathematical Engineering, and by the University Grants Commis- sion by Grant Part (2B) UGC-CAS-(Ph.IV). The material in this paper was Our contribution is to give a large deviations perspective to presented at the IEEE International Symposium on Information Theory (ISIT these results, shed further light on the aforementioned connec- 2007), Nice, France, June 2007, and at the IISc Centenary Conference on Man- tion between compression and guessing, and unify all prior re- aging Complexity in a Distributed World, (MCDES 2008), Bangalore, India, May 2008, and at the National Conference on Communications (NCC 2009), sults on existence of limiting guessing exponents. Specifically, Guwahati, India, January 2009. we show that if the sequence of distributions of the informa- M. K. Hanawal is with the INRIA and the University of Avignon, Labora- tion spectrum (see Han [9]) satisfies the toire Informatique d’Avignon, Avignon cedex 9 84911, France (e-mail: man- [email protected]). large deviation property, then the limiting exponent exists. This R. Sundaresan is with the ECE Department, Indian Institute of Science, Ban- is useful because several existing large deviations results can be galore 560012, India (e-mail: [email protected]). Communicated by H. Yamamoto, Associate Editor for Shannon Theory. 1If there are several sequences with the same probability of occurrence, they Digital Object Identifier 10.1109/TIT.2010.2090221 may be guessed in any order without affecting the expected number of guesses. 0018-9448/$26.00 © 2010 IEEE HANAWAL AND SUNDARESAN: GUESSING REVISITED 71 readily applied. We then show that all but one previously con- is one that satisfies Kraft’s inequality sidered cases in the literature2 satisfy this sufficient condition. See Examples 1–5 in Section IV. (4) The large deviation theoretic ideas are already present in the works of Pfister and Sullivan [5] and the method of types approach of Arikan and Merhav [7]. Our work however brings out where we have used the notation .To the essential ingredient (the sufficient conditions on the infor- each guessing function , we associate a PMF on and a mation spectrum), and enables us to see the previously obtained length function as follows. specific results under one light. Definition 1: Given a guessing function , we say de- The quest for a general sufficient condition under which the fined by information spectrum satisfies a large deviation property is a natural line of inquiry, and one of independent interest, in view (5) of the Shannon-McMillan-Breiman theorem which asserts that the information spectrum of a stationary and ergodic source con- is the PMF on associated with . The quantity in (5) is the verges to the Shannon entropy almost surely and in , for all normalization constant. We say defined by ; see for example [11]. In particular, the large deviation property implies exponentially fast convergence to entropy. In (6) the several specific examples we consider, the information spectrum does satisfy the large deviation property. One sufficient is the length function associated with . condition for the weaker property of exponentially fast con- Observe that vergence to entropy is the so-called blowing up property. (See Marton and Shields [12, Th. 2], or the survey article by Shields [13]). One family of sources, that includes most of the sources (7) we consider in this paper and goes beyond, is that of finitary en- codings of memoryless processes, also called finitary processes. and therefore the PMF in (5) is well-defined. We record the in- These are known to have the blowing-up property, and there- timate relationship between these associated quantities in the fore exponentially fast convergence to entropy (see Marton and following result. (This is also available in the proof of [14, Th. Shields [12, Th. 3]). It is an interesting open question to see if 1, p. 382].) finitary processes, or what other sources with the blowing up property, satisfy the large deviation property. Proposition 1: Given a guessing function , the associated The rest of the paper is organized as follows. Section II studies quantities satisfy the tight relationship between guessing and compression. Sec- tion III states the relevant large deviations results and the main (8) sufficiency results. Section IV rederives prior results by showing (9) that in each case the information spectrum satisfies the LDP. Section V contains proofs and Section VI contains some con- Proof: The first equality in (8) follows from the definition cluding remarks. in (5), and the second inequality from the fact that . The upper bound in (9) follows from the upper bound in (8) and from (6). The lower bound in (9) follows from II. GUESSING AND COMPRESSION In this section we relate the problem of guessing to one of source compression. An interesting conclusion is that robust source compression strategies lead to robust guessing strategies. For ease of exposition, let us assume that the message space is simply . The extension to strings of length is straightforward and will be returned to shortly. A guessing function We now associate a guessing function to each length function . is a bijection that denotes the order in which the elements of Definition 2: Given a length function , we define the as- are guessed. If , then the th guess is . Let denote sociated guessing function to be the one that guesses in the the set of natural numbers. A length function increasing order of -lengths. Messages with the same -length are ordered using an arbitrary fixed rule, say the lexicographical order on . We also define the associated PMF on to be 2These are cases without side information and key-rate constraints. The one exception is an example of Arikan and Merhav [7, Sec. VI-B] for which one can show the existence of Rényi entropy rate rather directly via a subadditivity (10) argument. See our technical report [10]. 72 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 1, JANUARY 2011 Proposition 2: For a length function , the associated PMF Analogously, let be any guessing function, and its asso- and the guessing function satisfy the following: ciated length function.

Guessing Revisited: a Large Deviations Approach Manjesh Kumar Hanawal and Rajesh Sundaresan, Senior Member, IEEE

Large Deviations and Exit Time Asymptotics for Diffusions and Stochastic Resonance

Large-Deviation Principles, Stochastic Effective Actions, Path Entropies

Sample Path Large Deviations for Lévy Processes and Random

Large Deviations for Markov Sequences

Large Deviations of Poisson Cluster Processes 1

Large Deviations Theory and Emergent Landscapes in Biological Dynamics

On the Form of the Large Deviation Rate Function for the Empirical Measures

Arxiv:1604.06828V2 [Math.PR]

Large Deviations Properties of Maximum Entropy Markov Chains from Spike Trains

Large Deviations for IID Sequences: the Return of Relative Entropy

Large Deviations in Mathematical Finance

Some Applications and Methods of Large Deviations in Finance and Insurance Huyen Pham