IEEE TRANSACTIONS ON COMPUTERS, VOL. 58, NO. X, XXX 2009 1 Provably Secure

Nicholas Hopper, , and John Langford

Abstract—Steganography is the problem of hiding secret messages in “innocent-looking” public communication so that the presence of the secret messages cannot be detected. This paper introduces a cryptographic formalization of steganographic security in terms of computational indistinguishability from a channel, an indexed family of probability distributions on cover messages. We use cryptographic and complexity-theoretic proof techniques to show that the existence of one-way functions and the ability to sample from the channel are necessary conditions for secure steganography. We then construct a steganographic protocol, based on rejection sampling from the channel, that is provably secure and has nearly optimal bandwidth under these conditions. This is the first known example of a general provably secure steganographic protocol. We also give the first formalization of “robust” steganography, where an adversary attempts to remove any hidden messages without unduly disrupting the cover channel. We give a necessary condition on the amount of disruption the adversary is allowed in terms of a worst case measure of mutual information. We give a construction that is provably secure and computationally efficient and has nearly optimal bandwidth, assuming repeatable access to the channel distribution.

Index Terms—Steganography, covert channels, provable security. Ç

1INTRODUCTION classic problem of computer security is the mitigation Steganographic “protocols” have a long and intriguing Aof covert channels. First introduced by Lampson [1], a history that predates covert channels, stretching back to covert channel in a (single-host or distributed) computer antiquity. For example, Kahn [3] relates stories of World system can be roughly defined as any means by which two War II prisoners, spies, and soldiers sending secret processes or users can exchange information in violation of messages—written in invisible ink, hidden in love letters security policy. While the exact detection of usable covert (e.g., hiding the message in the first character of each channels in a system is undecidable, many conservative sentence), or using other “physical” steganographic approaches exist to detect and eliminate all potential covert techniques—to circumvent inspections by both the Allied channels; for example, the US Department of Defense and Axis governments. In response, postal censors crossed “Light Pink Book” [2] on covert channel analysis includes out anything that looked like sensitive information (e.g., detailed procedures to find and eliminate covert channels. long strings of digits) and they prosecuted individuals Unfortunately, the cost of such elimination is often whose mail seemed suspicious. In many cases, censors prohibitive; in this case, the Light Pink Book recommends even randomly deleted innocent-looking sentences or techniques to limit the bandwidth of covert channels and entire paragraphs in order to prevent secret messages requires auditing to detect any use of the covert channel. A from being delivered. natural question that arises from this suggestion is whether More recently, there has been a great deal of interest in it is feasible for an auditor to do so. digital steganography, that is, in hiding secret messages in This paper focuses on the dual problem of steganogra- communications between computers. This interest is ap- phy: How can two communicating entities send secret parently fueled by the increased amount of communication messages over a public or audited channel so that a third that is mediated by computers and by connections to party such as the reference monitor cannot detect the potential commercial applications: Hidden information presence of the secret messages? Notice how the goal of could potentially be used to detect or limit the unauthorized steganography is different from classical , which propagation of the innocent-looking “carrier” data. Because seeks to conceal the content of secret messages: Stegano- of this, there have been numerous proposals for protocols to graphy is about hiding the very existence of the secret hide data in channels containing pictures [4], [5], video [5], messages. [6], [7], audio [8], [9], and even typeset text [10]. Many of these protocols are extremely clever and rely heavily on . N. Hopper is with the Department of Computer Science and Engineering, domain-specific properties of these channels. On the other University of Minnesota, Minneapolis, MN 55455. hand, the literature on steganography also contains many E-mail: [email protected]. clever attacks that detect the use of such protocols. As a . L. von Ahn is with the Computer Science Department, Carnegie Mellon result, it is unclear from this body of work whether secure University, Pittsburgh, PA 15213. E-mail: [email protected]. . J. Langford is with the Machine Learning Group, Yahoo! Research, New steganography is possible at all. York, NY 10018. E-mail: [email protected]. In this paper, we use techniques from and Manuscript received 26 Apr. 2007; revised 18 July 2008; accepted 15 Sept. complexity theory to answer the question “under what 2008; published online 23 Oct. 2008. conditions is (secure) steganography possible?” We give Recommended for acceptance by F. Lombardi. cryptographic definitions for symmetric- stegosystems For information on obtaining reprints of this article, please send e-mail to: [email protected], and reference IEEECS Log Number TC-2007-04-0146. and steganographic secrecy against a passive adversary in Digital Object Identifier no. 10.1109/TC.2008.199. terms of indistinguishability from a probabilistic channel

0018-9340/09/$25.00 ß 2009 IEEE Published by the IEEE Computer Society 2 IEEE TRANSACTIONS ON COMPUTERS, VOL. 58, NO. X, XXX 2009 process. We show that a widely believed complexity- bound is weaker than ours (in terms of bits per oracle query) theoretic assumption (the existence of a one-way function) by a factor of log2 2e but applies to a more general class of and access to a channel oracle are both necessary and stegosystems and is proved using techniques of independent sufficient conditions for the existence of secure stegano- interest. Van Le and Kurosawa [23] proposed a stegosystem graphy relative to any channel. We furthermore give a based on arithmetic coding that uses specialized knowledge construction that has essentially optimal bandwidth when of a channel to achieve higher bandwidth per symbol in compared with known provably secure constructions. some cases. Lysyanskaya and Meyerovich [24] considered Finally, we consider the question of robust steganography the effects of an imperfect channel sampling procedure on that resists attempts to censor the use of a covert channel; universal stegosystems. Backes and Cachin [25] introduced a we prove necessary and sufficient conditions for the notion of “active attacks” that use a decoding oracle to detect existence of a secure robust stegosystem and give a the presence of hidden messages in a special “challenge” provably robust stegosystem with nearly optimal band- cover message. width under attack. 1.2 Preliminaries 1.1 Previous Work We denote the length of a string or sequence s by s . We j j The scientific study of steganography in the open literature denote the empty string or sequence by " and the began in 1983 when Simmons [11] stated the problem in terms concatenation of strings s and s by s s . We assume the 1 2 1k 2 of communication in a prison. In his formulation, two use of efficient and unambiguous pairing and unpairing inmates, Alice and Bob, are trying to hatch an escape plan. operators on strings so that s1;s2 may be uniquely The only way they can communicate with each other is ð Þ interpreted as the pairing of s1 with s2 and may not be through a public channel, which is carefully monitored by the the same as s s . One example of such an operation is to 1k 2 warden of the prison, Ward. If Ward detects any encrypted encode s ;s by a prefix-free encoding of s , followed by ð 1 2Þ j 1j messages or codes, he will throw both Alice and Bob into s , followed by a prefix-free encoding of s , and then s . 1 j 2j 2 solitary confinement. The problem of steganography is then: Unpairing works in the obvious way. how can Alice and Bob cook up an escape plan by We let U denote the uniform distribution on 0; 1 k. If X k f g communicating over the public channel in such a way that is a finite set, we denote by x X the action of uniformly Ward does not suspect that anything “unusual” is going on? choosing x from X. We denote by L;l the uniform Anderson and Petitcolas [12], [13] posed many of the L F l distribution on functions f : 0; 1 0; 1 . For a prob- open problems resolved in this article. In particular, they f g !f g ability distribution D, we denote the support of D by D . pointed out that it was unclear how to prove the security of ½ Š For an integer n, we let n denote the set 1; 2; ...;n . a steganographic protocol and gave an example that is ½ Š f g similar to the protocol we present in Section 4 but noted that We make use of two different measures of the informa- it was not clear what properties were necessary to prove its tion in a probability distribution. Let be a probability D security. They also posed the open question of bounding the distribution with finite support D; then, the Shannon bandwidth that can be securely achieved over a given cover Entropy of is HS d D Pr d log2 Pr d , and the channel. D ðDÞ ¼ À 2 D½ Š D½ Š Minimum Entropy of is H mind D log2 Pr d . D P1ðDÞ ¼ 2 fÀ D½ Šg Since then, several works [14], [15], [16], [17] have We model all parties by Probabilistic Turing Machines addressed information-theoretic definitions of steganogra- (PTMs). A PTM is a standard Turing machine with an phy. Cachin [14], [18] defines security by requiring that the additional read-only “randomness” tape that is initially set relative entropy between stegotexts, which encode hidden so that every cell is a uniformly independently chosen bit. information, and independent identically distributed sam- If A is a PTM, we denote by x A y the event that x is ples from some innocent-looking covertext probability ð Þ drawn from the probability distribution defined by distribution, is small. He also gives a construction that is A’s output on input y for a uniformly chosen random tape. provably secure but relies critically on the assumption that We write A y to denote the output of A with random tape all orderings of covertexts are equally likely. Cachin points rð Þ fixed to r on input y. out several flaws in other published information-theoretic We sometimes use Oracle PTMs (OPTMs). An OPTM is a formulations of steganography. In addition to requiring PTM with two additional tapes, a “query” tape and a arbitrarily long shared secrets, all of these works assume a Q more restrictive model of innocent communication and do “response” tape, and two corresponding states query and not address bandwidth, robustness, or the necessary Qresponse. An OPTM runs with respect to some oracle O and conditions for steganography. when it enters state Qquery with value y on its query tape, it goes in one step to state Q , with x O y written to Subsequent to the announcements of our results [19], [20], response ð Þ its “response” tape. If O is a probabilistic oracle, then AO y several extensions have been investigated. Reyzin and ð Þ Russell [21] were the first to show that construction 1 (see is a probability distribution on outputs taken over both the Fig. 1) could be securely extended to multiple-bit hidden- random tape of A and the probability distribution on texts for high-entropy channels, but their construction is still O’s responses. Let F : 0; 1 k 0; 1 L 0; 1 l denote a family of less efficient than construction 2 (see Fig. 2) in terms of bits f g Âf g !f g per symbol. Independent of our bandwidth upper and lower functions. Informally, F is a pseudorandom function bounds, Dedic et al. [22] proved the security of a stegosystem (PRF) family if F and L;l are indistinguishable by oracle F similar to our construction 2 and gave an upper bound on the queries. Formally, let A be an oracle probabilistic adversary. bandwidth per query of a universal stegosystem. Their Define the PRF advantage of A over F as HOPPER ET AL.: PROVABLY SECURE STEGANOGRAPHY 3

prf FK k f k history of already drawn documents—for example, draw- AdvA;F k Pr A ðÁÞ 1 1 Pr A 1 1 : ð Þ¼ K Uk ð Þ¼ À f L;l ð Þ¼ F ing only packets that have a sequence number that is one hiÂà prf greater than the sequence number of the previous packet. Define the insecurity of F as InSecF t; q; k prf ð Þ¼ Therefore, we can think of communication as a series of maxA t;q AdvA;F k , where t; q denotes the set of 2Að Þf ð Þg Að Þ these partial draws from the channel distribution, condi- adversaries taking at most t steps and making at most q oracle prf tioned on what has been drawn so far. Notice that this queries. Then, F is a t; q;  -PRF if InSec t; q; k . k ð Þ F ð Þ notion of a channel is more general than the typical Suppose that l k and L k are polynomials. A sequence ð Þ ð Þ k L k l k setting in which every symbol is drawn independently F of families F : 0; 1 0; 1 ð Þ 0; 1 ð Þ is f kgk IN k f g Âf g !f g according to some fixed distribution: Our channel ex- called2 pseudorandom if, for all polynomially bounded plicitly models the dependence between symbols common adversaries A, Advprf k is negligible in k. We will A;F ð Þ in typical real-world communications. sometimes write Fk K; as FK . We note that the results ð ÁÞ ðÁÞ Let be a channel. For a finite sequence h DÃ, we let of [26] and [27], when taken together, imply that the C 2 Pr h s D Pr h; s . We let h denote the marginal C½ Š¼ 2 1 C½ð ފ C existence of PRFs and one-way functions are equivalent. channel distribution on a single document from D condi- P tioned on the history h of already drawn documents; we let l denote the marginal distribution on sequences of 2CHANNELS Ch l documents conditioned on h. Concretely, for any d D, 2 We seek to define steganography in terms of indistinguish- we will say that ability from a “usual” or innocent-looking pattern of communication. In order to do so, we must characterize Pr s s h;d D1 this pattern. We begin by supposing that Alice and Bob Pr d 2fð Þ g C½ Š h ½ Š¼ Pr s C P s h D1 communicate via documents. 2f g C½ Š and that, for any d~ dl, P Definition 1 (documents). Let D be an efficiently recognizable 2 prefix-free set of strings or documents. s h;d D Pr s Pr d~ 2fð Þ g 1 C½ Š : l h ½ Š¼ s h D Pr s As an example, if Alice and Bob are communicating over C P 2f g 1 C½ Š a computer network, they might run the TCP/IP protocol, in When we write “samplePx , ” we mean that a single Ch which case they communicate by sending “packets” document should be returned according to the distribution according to a format that specifies fields like a source conditioned on h. and destination address, packet length, and sequence Informativeness. We will require that a channel satisfy a number. minimum entropy constraint for all histories. Specifically, Once we have specified what kinds of strings Alice and we require that there exist constants L>0, > 0, and Bob send to each other, we also need to specify the l > 0 such that, for all h l L D , either Pr h 0 or probability that Ward will assign to each document. The  2  C½ Š¼ H h . If a channel does not satisfy this property, then 1ðC Þ S simplest notion might be to model the innocent commu- it is possible for Alice to drive the information content of nications between Alice and Bob by a stationary distribution: her communications to 0, so this is a reasonable require- each time Alice communicates with Bob, she makes an ment. We say that a channel satisfying this condition is independent draw from a probability distribution and L;; -informative and, if a channel is L;; -informative C ð Þ ð Þ sends it to Bob. Notice that, in this model, all orderings of for all L>0, we say that it is ; -always informative, or ð Þ the messages output by Alice are equally likely. This does simply always informative. Note that this definition implies not match well with our intuition about real-world an additive-like property of minimum entropy for communications; if we continue the TCP/IP analogy, we l marginal distributions; specifically, H h l. For ease 1ðC Þ notice, for example, that, in an ordered list of packets sent of exposition, we will assume that channels are always from Alice to Bob, each packet should have a sequence informative in the remainder of this paper; however, number that is one greater than the previous; Ward would our theorems easily extend to situations in which a channel become very suspicious if Alice sent only packets with odd is L-informative. The only complication in this situation is sequence numbers. that there will be a bound in terms of L;; on the ð Þ Thus, we will use a notion of a channel that models a number of bits of secret message that can be hidden before prior distribution on the entire sequence of communication the channel runs out of information. from one party to another: Intuitively, L-informativeness requires that Alice always Definition 2. A channel is a probability distribution on sends at least L packets over her TCP/IP connection to Bob, sequences s D1. and at least one out of every  packets she sends has some 2 probable alternative. Thus, we are requiring that Alice Any particular sequence in the support of a channel always says at least L= “interesting things” to Bob. describes one possible outcome of all communications Channel access. To demonstrate that it is feasible to from Alice to Bob—the list of all packets that Alice’s construct secure protocols for steganography, we will computer sends to Bob’s. The process of drawing from the assume only that Alice has oracle access to the marginal channel, which results in a sequence of documents, is channel distributions on her communications to Bob. Ch equivalent to a process that repeatedly draws a single This is reasonable because, if Alice can communicate “next” document from a distribution consistent with the innocently with Bob at all, she must be able to draw from 4 IEEE TRANSACTIONS ON COMPUTERS, VOL. 58, NO. X, XXX 2009 this distribution; thus, we are only requiring that, when Definition 4 (correctness). A stegosystem is correct if, for S using steganography, Alice can “pretend” she is commu- every message m and every history h DÃ, the “error 2M 2 nicating innocently. In fact, we will show in Section 5 that function” this requirement is necessary: If Alice can communicate Err ; ;m;h k Pr SDK SEK m; h ;h m steganographically with Bob, then she can draw samples S C ð Þ¼ ½ŠðÞð Þ 6¼ from her marginal channel distributions. is negligible, where the probability is taken over the key K and On the other hand, we will assume that the adversary, any coin tosses of SE and SD. Ward, knows as much as possible about the distribution on innocent communications. Thus, he will be allowed 3.2 Security oracle access to marginal channel distributions for Ch Intuitively, what we would like to require is that no efficient every history h and, in addition, the adversary may be warden can distinguish between stegotexts output by SE allowed access to an oracle that on input h DÃ;l ð 2 Þ and covertexts drawn from the channel distribution h. As returns an l-bit representation of Pr h . Note that this C C½ Š we stated in Section 2, we will assume that W knows the allows Ward to compute the marginal probabilities of distribution ; we will also allow W to know the individual documents under arbitrary histories. Ch algorithms involved in , as well as the history h of Alice’s These assumptions allow the adversary to learn as much S communications to Bob. In addition, we will allow W to as possible about the channel distribution but do not require any legitimate participant to know the distribution pick the hiddentexts that Alice will hide if she is in fact on communications from any other participant. We will, producing stegotexts. Thus, W’s only uncertainty is about however, assume that both Alice and Bob know the history the key K and the single bit denoting whether Alice’s of communications from Alice to Bob. outputs are stegotexts or covertexts. We will also assume that cryptographic primitives As with encryption schemes, we will model an attack remain secure with respect to oracles that draw from the against a stegosystem as a game played by a passive warden, marginal channel distributions . Thus, channels that can W, who is allowed to know the details of and . Ch S C be used to solve the hard problems that standard primitives Definition 5 (chosen hiddentext attack). In a chosen are based on must be ruled out. In practice, this is of little hiddentext attack with security parameter k, W is given concern, since the existence of such channels would have access to a “mystery oracle” M, which is chosen from one of previously led to the conclusion that the primitive in the following distributions: question was insecure. Notice that the set of documents need not be literally 1. ST. The oracle ST has a uniformly chosen key interpreted as a set of bitstrings to be sent over a network. In K U and responds to queries m; h with a k ð Þ general, documents could encode any kind of information, StegoText drawn from SE K; m; h . ð Þ including things like actions (such as accessing a hard drive 2. CT. The oracle CT has a uniformly chosen K as or changing the color of a pixel) and times (such as pausing well and responds to queries m; h with a CoverText ‘ ð Þ an extra 1 second between words of a speech). Our model is c of length ‘ SE K; m; h . 2 Ch ¼j ð Þj thus general enough to deal with these situations without If is stateful, M’s queries may include a state input to SE, S any special treatment. with the restriction that W must be state respecting: It cannot cause the state of SE to repeat itself. We require state- 3DEFINITIONS respecting adversaries against stateful schemes to model the ability of the sender to maintain a state that does not repeat, We will first define a stegosystem in terms of syntax and which can be used to ensure that different invocations of the correctness and then proceed to a security definition. protocol are “independent” of each other. Definition 3 (stegosystem). A stegosystem for message space M k S After interacting with its oracle, W 1 outputs a bit, , with security parameter k is a pair of probabilistic ð Þ M which represents its guess about the type of M. algorithms: We define W’s advantage against a stegosystem for S . .Encode (abbreviated SE) takes as input a key channel by S k C K 0; 1 , a string m (the hiddentext), and a 2f g 2M ss ST k CT k message history h. SE K; m; h returns a sequence of AdvW; ; k Pr W 1 1 Pr W 1 1 ; ð Þ S Cð Þ¼ ð Þ¼ À ð Þ¼ documents s1;s2; ...; (the stegotext). ð Þ where the probabilityÂà is taken over the randomnessÂà of ST, . .Decode (abbreviated SD) takes as input a key K,a S CT, and W. sequence of documents s ;s ; ...;s , and a message ð 1 2 lÞ Define the insecurity of with respect to channel by history h. SD K; s; h returns a hiddentext m . S C ð Þ 2M Both algorithms may be stateful, in which case they have an ss ss InSec ; t; q; l; k max AdvW; ; k ; implicit fourth parameter, the current state. S Cð Þ¼W t;q;l S Cð Þ 2Wð Þno where t; q; l denotes the set of all adversaries that make Wð Þ at most q k queries totaling at most l k bits (of hiddentext) 3.1 Correctness ð Þ ð Þ and running in time at most t k . Of course, in order for a stegosystem to be useful, it must be ð Þ correct: when using the same key, history, and state (if any), Definition 6 (steganographic secrecy). A stegosystem k is S decoding should usually recover the encoded message. called t; q; l;  steganographically secret against a ð Þ HOPPER ET AL.: PROVABLY SECURE STEGANOGRAPHY 5

Fig. 1. Construction 1: OneBit and MultiBit stegosystems.

chosen hiddentext attack for the channel ( t; q; l;  -SS- The idea behind OneBit is simple. The encoding ss C ð Þ CHA- ) if InSec ; t; q; l; k . algorithm makes ‘ copies of the bit m. For the ith copy, C S Cð Þ the encoder attempts to find a document d such that F N i; d m by drawing d . If the encoder fails, it 4CONSTRUCTIONS Kð þ Þ¼ Ch draws a second d0 h and sends d0 instead of d. The For our feasibility results, we have taken the approach of C decoder recovers ‘ bits by applying FK to each document assuming a channel that can be drawn from freely by the and outputs 1 if the majority of the bits are 1 and 0 stegosystem; this is in contrast to many ad hoc proposals in otherwise. Intuitively, this works because we expect each si the literature, where the stegosystem is given only a single to map to the bit m with probability about 3 and, so, the sample from the channel as input. We prove in Section 5 4 probability of a decoding failure should be negligible in ‘. that our notion of steganographic secrecy requires the ability Lemma 1. Let s ; ...;s SEf m; h , i.e., SE where F is to draw multiple samples from the channel. Thus, removing 1 ‘ ð Þ K this requirement would require a fundamentally different replaced by f ;1. Then, for any d D, Fà 2 notion of security than the one we give here. Pr si d Pr d : It is also worth noting that the stegosystem we ½ ¼ Š¼ h ½ Š C i construct that has relatively little knowledge of the channel distribution—SE—needs only to sample from an Proof. The event si d happens exactly when di d and oracle according to the distribution. This is because in ¼ ¼ f N i; di m (call this event Fi) or when di0 d and Fi many cases the full distribution of the channel has never ð þ Þ¼ ¼ happens. Because di and di0 are drawn independently been characterized; for example, the oracle may be a from and independent of f, we get human being or a video camera focused on some complex Chi scene. In these cases, the cost of sampling from h may be Pr si d Pr Fi di d Fi di0 d nontrivial. We therefore stress that, due to theC possible ½ ¼ Š¼ ð ^ ¼ Þ_ ^ ¼ expense of oracle queries, it is important to minimize their PrÂÃFi di d PrÀÁFi d0 d ¼ ½ð ^ ¼ Šþ ^ i ¼ use. We also point out that our definitions do not rule out Pr f N i; di m ÂÃPr di d efficient constructions for channels where more is known ¼ ½Šð þ Þ¼ ½ ¼ Š about the distribution. Pr f N i; di 1 m Pr d0 d þ ½Šð þ Þ¼ À i ¼ In practice, the oracle is also the weakest point of all of our 1 1 Âà constructions. We assume the existence of a perfect oracle: one Pr d Pr d Pr d : ¼ 2 h ½ Šþ2 h ½ Š¼ h ½ Š that can perform independent draws, one that can be C i C i C i rewound, etc. This assumption can be justified in some cases ut but not in others. If the oracle is a human, the human may not be able to perform independent draws from the channel as is Theorem 1. Let TSE denote the time required to execute required by our constructions. A real-world warden would ss OneBit.Encode . T h e n , InSecOneBit; t; q; q; k use this to his advantage. We therefore stress the following Cð Þ InSecprf t qT ; 2q‘; k . cautionary remark: Our security proofs only hold under the F ð þ SE Þ assumption that the channel oracle is perfect. Proof. Let W t; q; q be an adversary for OneBit. We 2Wð Þ construct a PRF adversary A for F with the same 4.1 A Simple Construction advantage as W. Af works by running W and intercept- We start by describing a low-bandwidth stegosystem that ing any queries W makes to its oracle M. To respond to transmits only single-bit messages. Here, we assume that these queries, Af emulates the encoding procedure Alice and Bob share a channel and let denote the channel C OneBit.Encode using the function oracle f in place distribution. We write d to denote the action of Ch of FK to get a result s and starts W with s as the response sampling d from the marginal distribution (via Ch to its oracle query. When W halts with output b, A also oracle access). We let FK ; denote a PRF family indexed outputs b. Clearly, when f F , AFK perfectly emulates ðÁ ÁÞ K by k K key bits that maps documents to bits, i.e., ST W ¼j kj the oracle for , so Fk : 0; 1 0; 1 à 0; 1 . We let Alice and Bob share f g Âf g !fk g a secret key K 0; 1 and also a synchronized d bit Pr AFK 1k 1 Pr W ST 1k 1 : 2f g K ð Þ¼ ¼ ð Þ¼ counter N (which need not be secret). The OneBit ÂÃÂÃf stegosystem shown in Fig. 1 allows Alice and Bob to encode By the previous lemma, when f ;1, A perfectly Fà and decode single bits sent via their shared channel. simulates a CT oracle for W, so 6 IEEE TRANSACTIONS ON COMPUTERS, VOL. 58, NO. X, XXX 2009

Fig. 2. Construction 2: OneBlock and NoState stegosystems.

f k CT k Pr A 1 1 Pr W 1 1 : f k ‘ Fk k that Pr A 1 1 eÀ , whereas Pr A 1 1 f ð Þ¼ ¼ ð Þ¼ ½ ð Þ¼ Š ½ ð Þ¼ Š¼ ErrOneBit; ;m;h k . So, by the definition of advantage, ÂÃÂÃprf C ð Þ Thus, by the definition of advantage, AdvA;F k ss ð Þ¼ prf ‘ AdvW;OneBit; k . The theorem follows by the definition AdvA;F k ErrOneBit; ;m;h k eÀ Cð Þ C of insecurity. ð Þ ð ÞÀ tu and it follows that this quantity is at most prf f InSecF TimeA k ; QueriesA k ;k . But, A runs in Lemma 2. Let s1; ...;s‘ SE m; h , where f ;1. Then, ð ð Þ ð Þ Þ ð Þ Fà time ‘TSE and makes 2‘ function-oracle queries, which for any i, proves the theorem. tu 1 1 For completeness, we also show in Fig. 1 the obvious Pr f N i; si m Pr d0 d1 : ½ð þ Þ¼ Š ¼ 2 þ 4 d0;d1 h ½ 6¼ Š C i extension of the stegosystem OneBit to multiple-bit hiddentexts. We assume the same setup as that used Proof. Consider the two documents di and di0 that SE draws previously. The MultiBit stegosystem works by simply in iteration i. It will be the case that f N i; si m OneBit ð þ Þ¼ repeatedly invoking on the individual bits of the exactly when either f N i; di m, which happens message m. It is straightforward to state and prove similar 1 ð þ Þ¼ with probability 2 , or when f N i; di 1 m and theorems about the security and correctness of MultiBit. ð þ Þ¼ À 1 f N i; d0 m, which happens with probability ð þ iÞ¼ 4 when d d0 and with probability 0 otherwise. The 4.2 Increasing the Rate i 6¼ i theorem applies for any i because the function f N i; Multibit L;; ð þ ÁÞ The stegosystem is secure for any - is independent of f N j; for i j. ð Þ ð þ ÁÞ 6¼ tu informative channel, but it is inefficient in the sense that in ‘ Lemma 3. Suppose is ; -always informative and f ;1. order to yield a negligible error rate eÀ , it must set C ð Þ Fà Then, we have ‘ ! log k , so the number of documents used to encode an ¼ ð Þ m-bit hiddentext is ! m log k , whereas the number of 1 1 = ð Þ Pr f N i; si m 1 2À : documents needed to ensure m bits of information in a i ½Šð þ Þ¼  2 þ 4 ð À Þ covertext is m= O m . Fig. 2 gives a stegosystem, ¼ ð Þ Proof. Because is ; -informative, for any h and any OneBlock, that can hide ! bits of hiddentext in C ð Þ  l 1  !  documents of stegotext, for any > 0. We sequence d1; ...;d h, there must be a j between 0  C ¼ð þ Þ ! will prove that this stegosystem has error rate !2À and and  1 such that H h;...;dj =. If this were not À 1ðCð ÞÞ  ! the case, then we would have h such that H h <. takes an expected 2 samples from the channel; in the next ‘ 1ðC Þ Thus, for a string of length ‘ drawn from , there must section, we prove that no stegosystem of this type can Ch be ‘= positions i that have H hi =. In these simultaneously hide as many or more bits of hiddentext in 1ðC Þ = positions, the collision probability is at most 2À . In the l documents while taking fewer than 2! samples from the other positions, the collision probability is at most 1. channel. We will assume for this construction that both Applying the previous lemma yields the result. tu Alice and Bob know the history h and a value l such that l k ! Theorem 2. If is ; -informative, then !< 1  H Ch . We let F : 0; 1 0; 1 à 0; 1 C ð Þ ð À Þ 1ð Þ f g Âf g !f g prf be a PRF and also assume that Alice and Bob share a ‘ k ErrOneBit; ;m;h k eÀ InSecF TSE‘; 2‘; k ; key K 0; 1 . C ð Þ þ ð Þ 2f g 1 = 2 The idea behind this construction is that, to encode where  2 4 1 2À and TSE is the time required to ¼ ð ð À ÞÞ message m 0; 1 !, we can search for a document s such execute the inner loop of OneBit.Encode. 2f g that FK N; s m; this will involve drawing about Proof. Lemma 3 implies that if f is a random function, then ! ð Þ¼0 1 2! ‘ 2 samples sð Þ;sð Þ; ...;sð Þ. The main problem we en- ErrOneBitf ; ;m;h k eÀ . We describe a PRF adversary A C ð Þ counter in extending the proof of Lemma 1 to this case is the for F that has an advantage of at least i j assertion that FK N; sð Þ and FK N; sð Þ are independent: ð i Þ j ð Þ ‘ This is only true if sð Þ sð Þ. To ensure that the sequence of ErrOneBit; ;m;h k eÀ : 6¼ C ð ÞÀ inputs to F N; is distinct, we maintain a counter c d , for K ð ÁÞ ½ Š A uses its function oracle f to emulate the action of SE each d D, that records how many times d has been 2 encoding a uniformly chosen bit m under history h, sampled and we include c s j as an argument to F N; . ½ ð ފ Kð Á Á ÁÞ counting the number of documents with f N i; si m. When decoding, we assume that the value of the counter for ð þ Þ¼ If fewer than 1 of the s satisfy f N i; s m, A the stegotext was 1; we prove that this is correct with high 2 i ð þ iÞ¼ outputs 1; otherwise, A outputs 0. Lemma 3 shows probability. HOPPER ET AL.: PROVABLY SECURE STEGANOGRAPHY 7

Theorem 3. Let be the OneBlock stegosystem. Then, output b, A outputs b as well. Clearly, when f F , S K l we have that ! ! H h prf ! Err ; ;m;h k eÀ !2 À 1ðÞC InSecF t; !2 ;k ; S C ð Þ þ þ ð Þ Pr AFK 1k 1 Pr W ST 1k 1 : where t O !2! . ð Þ¼ ¼ ð Þ¼ ¼ ð Þ When f isÂà a randomly chosenÂà function and since W is Proof. We will show that when FK is replaced by a random function f: state respecting, A never evaluates f on any point twice. Thus, Af is equivalent to a process that draws a l f ! ! H h new independent function at each stage. In this model, Pr f N; h; 1;SE m; h; N m eÀ !2 À 1ðÞC : ð Þ 6¼  þ for any d Dl, we have that Pr SE m; h d WeÂà canÀÁ use the same PRF adversary A from the proof of 2 ½ ð Þ¼ Š¼ Prf;s l s d f s m and, since s and f are drawn Theorem 2 to get an advantage of at least Advprf k Ch ½ ¼ j ð Þ¼ Š l A;F independently, we have that Pr SE m; h d Pr l d . ! ! H h ð Þ h Err ; ;m;h k eÀ !2 À 1ðC Þ, which will give the ½ ð Þ¼ Š¼ C ½ Š S C ð ÞÀ þ Thus, A’s responses to W’s queries are distributed desired bound. according to , so Let C denote the event that OneBlock:Encodef m C ð Þ outputs an s with c s > 1. This happens when there is Pr Af 1k 1 Pr W CT 1k 1 : i ½ iŠ ð Þ¼ ¼ ð Þ¼ at least one j

l 4.3.1 MAXt S encodes !>H h bits of hiddentext in l documents of ð Þ 1ðC Þ For any stegosystem S h; l; !; t , we will show that there stegotext nonnegligibly often. 2Sð Þ exists a channel such that S is insecure relative to if Suppose we have a specific history h such that SE C C ! log t is any positive constant. Thus, it follows that ‘ 1 l H l ‘ À encodes bits by samples from h and h . (If MAX S log t. þ C 1ðC Þ¼ tð Þ such histories occur nonnegligibly often, then we can find ss c t;k k Theorem 5. InSecS; O t k ; 1;!; k 1 2À ð Þ 2À one by sampling from an oracle for SE; if they do not, then Cð ð þ Þ Þ À À À & k; ! , where & k; ! Prm U!;K;h SD K;SE K; m; h ; l ð Þ ð Þ¼ ½ ð ð Þ the rate of the stegosystem does not exceed H h ) Since 1 h m , and ! log t c t; k . l ðC Þ Þ 6¼ Š  þ ð Þ H h ‘, we know that there is at least one l-document 1ðC Þ¼ Proof. The channel is defined as follows: to every history h ‘ l C string, sÃ, that has probability 2À under and no other of length k, we associate a polynomial of degree t 1 over Ch þ GF 2k with uniformly chosen coefficients, p x . To draw string has more probability mass than sÃ. Now, if SE were ð Þ hð Þ from , we draw x U , compute y p x , and output deterministic, then we would have that Ch k ¼ hð Þ x y. Notice that if S:SE K; m; h draws at most t samples ‘ 1 k ð Þ Pr SE m sà : m U‘ 1 2Àð þ Þ x1;y1 ; ...; xt;yt h and outputs a pair x0;y0 such ½ð Þ¼ þ Š  ð Þ ð Þ C ð k Þ that x0 x ; ...;x , then Pr y0 p x0 2À . On the by the unique decoding property. Even if SE is rando- 62f 1 tg ½ ¼ hð ފ  other hand, an adversary can draw t 1 samples from þ Ch mized, then, for any fixed random bits r, we have and compute ph by interpolation. Thus, when SE K; m; h ð Þ ‘ 1 outputs a pair x0;y0 x1;y1 ; ...; xt;yt , an adversary Pr SE m; r sà : m U‘ 1 2Àð þ Þ: ð Þ 62 fð Þ ð Þg ½ð Þ¼ þ Š  can distinguish between SE m; h and by checking ð Þ Ch But then, by an averaging argument, there must be some whether y0 ph x0 . ‘ 1 ‘ 1 ¼ ð Þ mà 0; 1 þ such that Pr SE mà sà < 2Àð þ Þ. In con- Formally, we define the adversary AO as follows: 2f g ½ ð Þ¼ Š k ‘ On input 1 , sample a history h of length k by drawing trast, a covertext oracle CT will have Pr CT m sà 2À , ½ ð Þ¼ Š¼ h1 " and hi h1;...;hi 1 for 1

4.3.2 MAX S right andqffiffiffiffiffiffip^mà is much larger than expected (say, Cð Þ  1 We exhibit a chosen-history, chosen-hiddentext distin- p^m þ p^, and call this event Q). Then, by Chernoff à  2 guisher for any black-box stegosystem SE;SD that bounds,qffiffiffiffiffiffi we have ð Þ HOPPER ET AL.: PROVABLY SECURE STEGANOGRAPHY 9

ST k Pr W 1 0 Pr P Q Pr P Q Theorem 7. If is ; -informative and SE;SD is a stateful ð Þ¼ ¼ ½ ^ Šþ ½ ^ Š C ð Þ ð Þ Pr P Pr Q P stegosystem for , then, for any q;" 2d=2 Âà ½ Šþ ½ j Š C   1 ‘ ss ss Pr p<^ þ 2À InSecNoState; t; q; "; k InSec SD;SE ; t qTG;q;"; k  Cð Þ ð Þ Cð þ Þ  "#rffiffiffiffiffiffiffiffiffiffiffi2 prf q q 1 d=2 L= InSecG t ‘"; q; k ð À Þ 2À 2À :  1 ‘ þ ð þ Þþ 2 ð þ Þ Pr p^ þ 2À þ mà  2  2 n  1 2 Proof. We reformulate the CT oracle in the chosen- 1 p þ n =3 eÀ2 À 2 eÀ :  þ hiddentext attack game so that it also evaluates an ÀÁffiffiffiffiffi element of G on the first L documents of its reply S;T We also know that, when W has a covertext oracle, it ð Þ ‘ to every query. Let NC denote the event that the values should be the case that for every m 0; 1 Ã, E p^m 2À . G S ; ...;G S are all distinct during the chosen- 2f g ½ Š¼ ð 1Þ ð qÞ Thus, W should only output 1 when p^ is much larger than hiddentext attack game and let C denote the complement expected (say, p>^ 1 1 1 2 2 ‘, and denote this ss 2  1 À of NC. Then, for any W, we can bound AdvW;NoState; k ð þ ð À þ ÞÞ ST k CT k Cð Þ event R) or some p^m is much smallerqffiffiffiffiffiffi than its expectation by Pr W 1 1 NC Pr W 1 1 NC Pr C .  1 ð ½ ð Þ¼ j ŠÀ ½ ð Þ¼ j ŠÞ þ ½ Š (say, p^m < 2þ p^, and denote this event Sm). Then, we To bound the first term, note that we can make an adversary X against SD; SE that achieves this advan- have that qffiffiffiffiffiffi ð Þ tage by choosing a and running W. When W queries its CT k L Pr W 1 1 oracle at m; h , X draws S1 h and computes ð Þ¼ d=2 ð Þ C N 2 G S1 . X will halt with output 0 in case of ÂÃPr R m:Sm Pr R m:Sm ¼ ð Þ ¼ ½ ^9 Šþ ½ ^9 Š a repeated counter; otherwise, XO will return Pr R Pr m:Sm R S ; m; h; S ;N to W.  ½ Šþ ½9 j Š ð 1 Oð ð 1Þ ÞÞ Pr R 2n Pr S R To bound the probability of event C, we imagine that  ½ Šþ ½ mj Š the game is played with a random function f  2 ;d=2 n 1 2  3  1 1 ‘ Fà 12 p 1 in place of G and let C denote the event C in this eÀ À þ 2 n Pr p^m < þ 2À f  þ "# 2 2 À 2! ÀÁffiffiffiffiffi rffiffiffiffiffiffiffiffiffiffiffi scenario. Let S1; ...;Sq denote the L-document prefixes 2 2 n 2 9n  1 of the sequences returned by the oracle in the chosen- 1 p  1 p þ eÀ12 À  1 2 neÀ 8 À 2 ; þ hiddentext attack game and let N f S . Then, the  þ i ¼ ð iÞ ÀÁffiffiffiffiffi ÀÁffiffiffiffiffi event C happens when there exist i j such that where the last three lines follow by the union bound and f 6¼ N N or, equivalently, f S f S , and this event multiplicative Chernoff bounds. Combining these i ¼ j ð iÞ¼ ð jÞ happens when Si Sj or Si Sj f Si f Sj . Thus, bounds with the constant c, we have q q 1 ¼L= d=26¼ ^ ð Þ¼ ð Þ Pr C ð À Þ 2 2 . f ½ f Š 2 ð À þ À Þ ss ST k CT k Finally, observe that, for every W t; q; " , we can AdvW;S; k Pr W 1 1 Pr W 1 1 2Wð Þ Cð Þ¼ ð Þ¼ À ð Þ¼ construct a PRF adversary A for G in t ‘"; q such cn cn  cn prf Að þ Þ 1 ÂÃ2eÀ eÀ 2 neÂÃÀ : that Adv k Pr C Pr C . A runs W, using its  À Àð þ Þ A;Gð Þj ½ ŠÀ ½ f Šj oracle f in place of G to respond to W’s queries. A The theorem follows by the definition of insecurity. tu outputs 1 if the event Cf occurs and 0 otherwise. Then, Pr AG 1k 1 Pr C , a n d Pr Af 1k 1 Pr C , 4.4 Removing Shared State ½ ð Þ¼ Š¼ ½ Š ½ ð Þ¼ Š¼ ½ f Š which satisfies the claim. The previous constructions require Alice and Bob to keep tu a synchronized d-bit counter N; this is undesirable due to the fact that there may be no good way to resynchronize 5NECESSARY CONDITIONS once a counter is corrupted. The NoState stegosystem The previous section demonstrates that relative to an oracle shown in Fig. 2 shows how to convert any “stateful” for , the existence of one-way functions is sufficient for the C stegosystem that requires only unique counter values (not existence of secure steganography. In this section, we will necessarily consecutive ones) into a stateless stegosystem, explore weaker definitions of steganographic secrecy and 1  establish two results. First, one-way functions are necessary at an expense of L documents. When L O k À , this ¼ ð Þ for steganography; thus, relative to a channel oracle, the preserves the rate (near) optimality of the OneBlock existence of one-way functions and secure steganography stegosystem. The encoding and decoding functions of are equivalent. Second, we will show that in the “standard NoState are parameterized by the encoding and decod- model, ” without access to a channel oracle, the existence of ing functions of the underlying “stateful” stegosystem, a secure stegosystem implies the existence of a program and additionally share a secret key to a PRF family that samples from h and, thus, in the standard model, k d C G : 0; 1 DL 0; 1 . The NoState stegosystem secure steganography for exists if and only if (iff) h is k f g  !f g C C works by choosing a long sequence from L and using it efficiently sampleable. Ch to derive a value N, which is then used as the state for the 5.1 Steganography Implies One-Way Functions “stateful” stegosystem. This value is always a multiple of To strengthen our result, we develop the weaker notion of d=2 2 so that, if the value derived from the long sequence security against known-hiddentext attacks (KHAs). In an never repeats, then any messages of length at most 2d=2 l; " -KHA against distribution , the adversary is given a ð Þ D will never use a value of N used by another message. history h of length l, a hiddentext drawn from ", and a D 10 IEEE TRANSACTIONS ON COMPUTERS, VOL. 58, NO. X, XXX 2009

SE K;m;h SE K;m;" 2 sequence of documents s Dj ð Þj. The adversary’s HS m jjð Þ K HS ; 2 jC" ¼j j ðDÞ task is to decide whether s h or s SE K; m; h . We  C ð Þ due to independence, whereas H f U define the KHA advantage of W by Sð ð k1 ÞÞ ¼ H SE K; m; " H m SE K; m; " and Sð ð ÞÞ þ Sð j ð ÞÞ kha- AdvW; ;D k; l; " Pr W h; m; SE K; m; h 1 HS m SE K; m; " 1 # K S Cð Þ¼ ½Šðð Þ Þ ¼ ðj ð Þ Þ  ð þ Þj j SE K;m;h for a negligible function #. To see that this is the case, Pr W h; m; jjð Þ 1 À Ch ¼ notice that m SD K;SE K; m; " and so is deter- ¼ ð ð ÞÞ hi mined (up to a negligible probability) by K and and say that is secure against KHA with respect to and S D C HS K K . Thus, asymptotically, we have that (SS-KHA- - ) if, for every PPT W, for all polynomially ð Þ¼j j HS f Uk1 >HS g Uk2 and f is an FEG relative to an D C kha ð ð ÞÞ 1 ð ð ÞÞ bounded l and ", AdvW; À;D k; l k ;" k is negligible in k. oracle for . S C ð ð Þ ð ÞÞ Thus, a stegosystem is secure against KHA if given the C tu Corollary 1. Relative to an oracle for , secure stegosystems exist history h and a plaintext m, an adversary cannot distinguish C (asymptotically) between a stegotext encoding m and a iff one-way functions exist. covertext of the appropriate length drawn from . We will Ch 5.2 Sampleable Channels Are Necessary show that one-way functions are necessary even for this We say that a channel is efficiently sampleable if there exists C much weaker notion of security. In order to do so, we will an algorithm C such that for any polynomial time A, for any use the following results from [27]: polynomial l, Definition 7 [27, Definition 3.9]. A polynomial-time k ‘ k computable function f : 0; 1 0; 1 ð Þ is called a false k k k Pr A 1 ; C h; 1 ;Uk Pr A 1 ; h f g !f g l k l k entropy generator (FEG) if there exists a constant c and h "ð Þ ð Þ À h "ð Þ ð C Þ k ‘ k C C polynomial-time-computable g : 0; 1 0 0; 1 ð Þ such that ÂÃÀÁÂà f g !f g is negligible in k. Notice that, for any efficiently sampleable 1) H g U >H f U 1=ckc and 2) f U g U . Sð ð k0 ÞÞ Sð ð kÞÞ þ ð kÞ ð k0 Þ channel , the results of the previous sections prove that C secure steganography with respect to exists iff one-way Thus, a function is an FEG if its output is indistinguishable C functions exist in the standard model—e.g., without assuming from a distribution with higher (Shannon) entropy. We oracle access to the channel . Here, we will show that if make use of the following. C secure steganography exists for in the standard model, C Theorem 8 [27, Lemma 4.16]. If there exists an FEG, then there then is efficiently sampleable. C exists a pseudorandom generator. Theorem 10. If there exists an efficiently sampleable such that D there is a SS-KHA- - secure stegosystem in the standard D C S We note that the proof of Theorem 8 is “black box” with model, then is efficiently sampleable. respect to the FEG, and thus, it holds relative to the C Proof. Consider the program C with the following behavior: presence of the channel oracle . S C on input 1k;h , C picks K 0; 1 k, picks m , and ð Þ S f g D Theorem 9. If there is a stegosystem that is SS-KHA- - returns the first document of :Encode K; m; h . Consider S D C S ð Þ secure for some hiddentext distribution and some channel , A KHA D C any PPT distinguisher . We will show that the then there exists a pseudorandom generator, relative to an adversary W that passes the first document of its input to oracle for . A and outputs A’s decision has at least the advantage of C Proof. We will show how to construct a FEG from A. This is because, in case W’s input is drawn from SE, .Encode, which, when combined with Proposition 8, the input it passes to A is exactly distributed according to k S C 1 ;h and, when W’s input is drawn from h, the will imply the result. We consider an experiment which Sð Þ C k2 input it passes to A is exactly distributed according to h: draws a key K Uk and hiddentext m and C D define the random variable SE K; m; " . Because kha- L¼j ð Þj AdvW; ;D k; h ; 1 is SS-KHA- - secure, we must have S C ð j j Þ S D C Pr WSE K; m; h 1 Pr W 1 ¼ jj½ðð Þ Þ ¼ Š À ½ŠðChÞ¼ k k k SE K; m; " ;m "L;m Pr A 1 ; C 1 ;h 1 Pr A 1 ; h 1 : ðð Þ ÞC ¼ Sð Þ ¼ À ð C Þ¼ and, for any k, we must have oneÀÁ of the following cases: But, because ÂÃÀÁis SS-KHA- - secure,Âà we know that S D C W’s advantage must be negligible and, thus, no efficient A 1. H L >H SE K; m; " 1=k. In this case, the SðC" Þ Sð ð ÞÞ þ program that samples from is an FEG. can distinguish this from the first document drawn from C" SE K; ;h hj ð D Þj. So, the output of C is computationally 2. HS "L

sequence returned by W, respectively. Then, Af returns 1 f iff SD sÃ;hà mÃ. It is easy to see that when f F , ð Þ 6¼ k Af perfectly simulates ROneBit to W. Now, suppose that f is chosen uniformly from all appropriate functions. Then, for each i, the stegotexts mà k mà 'ð Þ C 1 ;hð Þ;f N i; mà are distributed indepen- i ¼ ð i ð þ ÞÞ dently according to hi . Consider the sequence of C 1 m 1 m “alternative stegotexts” 'ð À ÃÞ C 1k;hð À ÃÞ;f N i; i ¼ ð i ð þ 1 mà ; each of these is also distributed independently À ÞÞ Fig. 3. Construction 3: ROneBit. according to 1 m and, since W is never given access to hð À ÃÞ C i 1 m ð À ÃÞ k k the documents 'i , its output documents sià are F : 0; 1 0; 1 à 0; 1 and have a synchronized 1 mà independent of each 'ð À Þ. Now, SD will fail (causing counterf gNÂ. Wef g will!f let ngk ! log k be a “robustness i ð Þ¼ ð Þ f k 1 mà parameter.” We begin with a stegosystem—ROneBit, A 1 to output 1) only if the event i: 'ði À Þ;sià R0 ð Þ 1 mà 8 ð Þ2 shown in construction 3 (see Fig. 3)—that robustly occurs. Because the 'ið À Þ are independent of the actions encodes a single bit. of W and because ;R is -admissible, each event ðD 0Þ The idea behind this construction is the following: 1 mà 'ði À Þ;sià R0 happens independently with probability Suppose that, instead of sharing a key to a PRF F, Alice ð Þ2 n 0 1 at most . So, the probability of failure is at most  and and Bob shared two secret documents 'ð Þ and 'ð Þ drawn independently from h. Then, Alice could send Bob the prf FK k f k C m AdvA;F k Pr A 1 1 Pr A 1 1 message bit m by sending document 'ð Þ and Bob could ð Þ¼ ð Þ¼ À ð Þ¼ recover m by checking to see if the document he received SuccÂÃR k PrÂÃAf 1k 1 0 1 W;ROneBit was related (by R0) to 'ð Þ or 'ð Þ. Since the adversary is ¼ ð ÞÀ ð Þ¼ R bounded and ;R is -admissible, the probability of a SuccR k n:Âà 0  W;ROneBitð ÞÀ decoding error—causedðD Þ either by the adversary or by 0 1 ut accidental draw of 'ð Þ and 'ð Þ—would be at most . Intuitively, ROneBit reduces the probability of a decoding error to n by encoding each hiddentext bit n times. Theorem 11. If F is pseudorandom, then ROneBit is substitution robust against R for . Lemma 4. ROneBit is steganographically secret against a state- C respecting chosen-hiddentext attack: It is straightforward to extend ROneBit to a robust ss prf stegosystem that encodes multiple bits by encoding each of InSecROneBit; t; q; q; k InSecF t O nq ; nq; k : Cð Þ ðÞþ ð Þ the bits individually. Another straightforward extension Proof. Let W be a passive warden that runs in time t and replaces single-bit arguments to the PRF with !-bit makes at most q queries of total length at most q (each arguments. In this case, the removal probability for an query can be only one bit, because of the type). We adversary is increased to 2!‘ and the decoding time construct a PRF adversary A that runs in time t O nq increases by a factor of O 2! . ð Þ and makes at most nq queries to F suchþ thatð Þ prf ss 6.4 Nearly Optimal Robust Steganography AdvA;F k AdvW;ROneBit; k . The PRF adversary takes a functionð Þ¼ oracle f and emulatesCð Þ W 1k , responding to It is not hard to see that the maximum rate of hiddentext per the queries W makes to the encoderð SEÞ by using f in stegotext document attainable with the previous construc- place of FK ; ; A gives the same output as W. tion approaches log 1= and that in order to have negligible Now, whenðÁ ÁÞf F , clearly, Af is perfectly simulat- insecurity, the construction must have superpolynomial K ing SE to W, so Pr AFK 1k 1 Pr W ST 1k 1 . On decoding time. We now show that this rate is essentially ½ ð Þ¼ Š¼ ½ ð Þ¼ Š the other hand, when f is a uniformly chosen function, optimal: if a stegosystem is robust against any -admissible then Af is handing W stegotexts generated by calls to relation R (given access to R), then it can encode at most C h; 1k; with independent random values and, thus, log 1= ð ÁÞ bits per document. We also demonstrate an efficient they are distributed identically to samples from h, robust stegosystem that encodes 1  o 1 log 1= bits f k CT k prf C ð À À ð ÞÞ so Pr A 1 1 Pr W 1 1 and AdvA;F k > 0 ss½ ð Þ¼ Š¼ ½ ð Þ¼ Š ð Þ¼ per document, for any constant , showing that this AdvW;ROneBit; k . The claim follows by the definition of upper bound is tight. insecurity. Cð Þ tu R prf 6.4.1 Upper Bound L e m m a 5 . FailROneBit t; q; q; 1;k InSecF t O nq ; n ð Þ ð þ ð Þ nq; k  . Recall the definition of R; D maxy x;y R Pr x . We Þþ Ið Þ¼ ð Þ2 D½ Š Proof. Let W be an active R-bounded t; q; q; 1 warden. We will show that any universal stegosystem for -admissible ð Þ R R P construct a PRF adversary A that runs in time t O nq , relations (given access to ) that attempts to transmit þ ð Þ more than ‘ log  bits in ‘ documents is either not makes at most nq PRF queries, and satisfies À prf R universally secret or not universally robust. Adv k Succ k n. Af works by emulat- A;F ð Þ W;ROneBitð ÞÀ ing W, using its function oracle f in place of F ; to Theorem 12. Let be a universal stegosystem that transmits KðÁ ÁÞ S emulate ROneBit.Encode in responding to the queries more than 1  log 1= bits per document. For every ð þ Þ ð Þ of W. Let mà and sà be the hiddentext and the stegotext 0 << 1, there exist a channel and relation R such that C HOPPER ET AL.: PROVABLY SECURE STEGANOGRAPHY 13

Fig. 4. Construciton 4: RMBit.

R cp‘ Fail ; t; 0; 0; 1  ‘; k 1 2À , where c is a constant As before, we will assume that is efficiently sample- S Cð ð þ Þ Þ À k C k depending on . able, that F : 0; 1 0; 1 à 0; 1 is pseudorandom ffiffi f g Âf g !f g k and both parties share a secret K 0; 1 , and a Proof. We let be the uniform distribution on n-bit strings, 2f g C N and R x; y 1 iff the hamming distance of x and y is at synchronized counter . As before, we will let ð Þ¼ l 1  ‘ log 1= , but we now set ‘ so that l log k. We most d, where d and n are constants chosen to make ¼ð À Þ ð Þ ¼ set an additional parameter L k= log 1= . R; . We will give an attacker W that achieves the ¼ ð Þ Ið CÞ  The idea behind this procedure is to break an nl-bit stated success probability. For notational convenience, message into l-bit blocks, encode each one using the we define l ‘ log . ¼À RLBit.Encode procedure, and then append the encoding W picks the challenge hiddentext mà U and gets in l of L documents of message-dependent redundancy. To response the stegotext ' :SE K; m . W then uni- 2S ð ÃÞ decode, we iteratively attempt to match each stegotext block formly picks a sequence s subject to ' s d for à j i À iÃj against each of the 2l k possible hiddentext blocks; there 1 i ‘. W’s output is the sequence sÃ. ¼   will always be one matching block, and with some small We now compute the success probability of W. Recall  R probability kÀ , there will be an additional match. We that SuccW; k Pr SD K; sà mà , where this prob- Sð Þ¼ ½ ð Þ 6¼ Š perform a depth-first search on these matches to find a list ability is taken over K, mÃ, sÃ, and '. Notice that the L adversary W is identical to a noisy discrete memoryless of candidate messages and then test each message to see whether the redundancy matches. Any candidate match channel, with p sà ' defined as the uniform distribution n ð j Þ on s 0; 1 : s ' d . This channel has Shannon from the depth-first search will also have matching f 2f g j À j g k capacity exactly log R; log . Furthermore, any redundancy with probability 2À , and a union bound will À Ið CÞ ¼ À robust stegosystem is a candidate code for the channel. thus bound the probability of a decoding failure by 1 k The strong converse to Shannon’s coding theorem [28] 1  2À . Furthermore, the total expected number of nodes ð þ Þ 1 tells us that any code with rate 1  log 1= will have an explored by Decode is at most 1  n, so the reduction is ð þ Þ p ð þ Þ average error probability of at least 1 2 c ‘, where efficient. À À c 2 4n 2 log 1= . ¼ À þ ð Þ ffiffi Theorem 13. RMBit (Fig. 4) is steganographically secret against Since the event that the adversary W is successful is a state-respecting chosen-hiddentext attack: identical to the event that a decoding error occurs in the SE K; ;SD K; ss prf code induced by , we have that InSecRMBit; t; q; l"; k InSecF t O "‘ ;"‘; k : ð ÁÞ ð ÁÞ Cð Þ ðÞþ ð Þ R cp‘ Succ k 1 2À ; W;Sð Þ À Proof. Let W be a passive warden that runs in time t and q l" which satisfies the theorem. ffiffi makes at most queries of total length at most (each tu query must be a multiple of l bits because of the input type). As in the previous proofs, we construct a PRF 6.4.2 Construction adversary A that runs in time t O "‘ and makes at þ ð Þ In this section, we will give a secure universally most "‘ queries to F such that -substitution-robust stegosystem that achieves rate 1  prf ss ð À À AdvA;F k AdvW;RMBit; k : o 1 log 1= for any > 0. We note that the construction ð Þ¼ Cð Þ ð ÞÞ ð Þ mentioned at the end of Section 6.3, which encodes ! bits at The PRF adversary takes a function oracle f and emulates W 1k , responding to the queries W makes to a time by using a PRF with !-bit arguments, can in principle ð Þ its oracle by running RMBit.Encode, using f in place achieve rate 1  log 1= by setting ! 1  log 1= ‘. O ð À Þ ð Þ ¼ð À Þ ð Þ of F ; . We see that, when f F , Af perfectly Notice, however, that because the runtime of the decoding K ðÁ ÁÞ K emulates RMBit.Encode and, when f is a uniformly procedure in this case is exponential in ‘, the proof of chosen function, the stegotexts passed to the adversary robustness is not very strong: The information-theoretic are samples from C h; with independent uniform ð ÁÞ bound on the success of W is essentially polynomial in the inputs; by the assumption on C, these are identically runtime of the PRF adversary we construct from W. We will distributed to samples from ‘ . Thus, A’s advantage over Ch now give a construction with a polynomial-time decoding F is exactly W’s advantage over RMBit. algorithm, at the expense of a o 1 factor in the rate. The claim follows by the definition of insecurity. ð Þ tu 14 IEEE TRANSACTIONS ON COMPUTERS, VOL. 58, NO. X, XXX 2009

Theorem 14. RMBit is robust: number of matching prefix messages. Then, n E X n 1 1= , and a Chernoff bound gives us FailR t; q; l"; ln; k  ½ Š ð þ Þ RMBitð Þ prf k n Pr X>2n 1 1= Pr X>2E X InSecF t0;q0;k 1 1= 2À e=4 ; ½ð þ Þ Š  ½Š½ Š  ð Þ þ ð þ Þ þð Þ E X n e=4 ½ Š e=4 ; where t0 t O l " n O 1 1= kn and q0 ð Þ ð Þ  þ ðð þ Þ Þþ ðð þ Þ Þ  2n 1 1= l " n : which completes the proof. ð þ Þþ ð þ Þ tu Proof. Let W be an active R-bounded t; q; l"; ln warden. ð Þ We construct a PRF adversary A that runs in time t0, makes at most 2n 1 1= l " n PRF queries, and ACKNOWLEDGMENTS prf ð þ ÞþRð þ Þ k satisfies Adv k Succ k 1 1= 2À A;F ð Þ W;RMBitð ÞÀð þ Þ À The authors wish to thank Manuel Blum, Christian Cachin, e=4 n. Af works by emulating W, using its function ð Þ Mike Reiter, Leo Reyzin, , Scott Russell, and oracle f in place of FK ; to emulate RMBit.Encode David Wagner for the helpful conversations about this ðÁ ÁÞ in responding to the queries of W. Let mà and sà be work. They especially thank Lea Kissner, Tal Malkin, and the hiddentext and the stegotext sequence returned Omer Reingold for pointing out an error in the proceedings by W, respectively. Then, Af returns 1 iff version [19]. This work was supported by the US National SDf s ;h m . To ensure that the number of queries Science Foundation under Grants CCR-0122581, CCR- ð à ÃÞ 6¼ à and runtime are at most t and 2n 1 1= l " n , 0085982, and CNS-0546162. 0 ð þ Þþ ð þ Þ we halt whenever SDf makes more than 2n 1 1= ð þ Þ queries to f, an event we will denote by TB. We will REFERENCES n show that Pr TB e=4 when f is a randomly [1] B.W. Lampson, “A Note on the Confinement Problem,” Comm. ½ Šð Þ chosen function. Thus, we can neglect this event in ACM, vol. 16, no. 10, pp. 613-615, 1973. [2] NCSC-TG-030: A Guide to Understanding Covert Channel Analysis of our case analyses. Trusted Systems. US Nat’l Computer Security Center, 1993. Now, when the oracle f is a uniformly chosen [3] D. Kahn, The Codebreakers: The Story of Secret Writing, revised ed. function, a decoding error happens when there exists Scribner, 1996. ln [4] G. Jagpal, “Steganography in Digital Images,” PhD dissertation, another m 0; 1 such that for all i; j , 1 i ‘, Cambridge Univ., Selwyn College, May 1995. 2f g f ð Þ   1 j n, we have s j 1 n i; LEnc m1...j i R, and [5] Information Hiding Techniques for Steganography and Digital Water-   f ð ð À Þ þ ð Þ Þ2 marking. S. Katzenbeisser and F.A. Petitcolas, eds. Artech House, also, s‘n i; LEnc m i R for all i, 1 i L. Let j be ð þ ð Þ Þ2   2000. the least j such that mj mjÃ. Then, for blocks 6¼ f [6] K. Matsui and K. Tanaka, “Video-Steganography: How to Secretly mj 1; ...;mn, the ‘-document blocks LEnc m1...j i are Embed a Signature in a Picture,” Proc. IMA IP Workshop, 1994. þ ð þ Þ independent of 'jà i. Thus, for such m, the probability of [7] A. Westfeld and G. Wolf, “Steganography in a Video þ ‘ n j L k n j ‘ Conferencing System,” Proc. Second Int’l Workshop Information a match is at most  ð À Þþ 2À ð À Þ . Since there are l n j ¼ Hiding (IH ’98), D. Aucsmith, ed., pp. 32-47, 1998. 2 ð À Þ messages matching mà in the first j blocks, we [8] D. Gruhl, A. Lu, and W. Bender, “Echo Hiding,” Proc. First have that Int’l Workshop Information Hiding (IH ’96), R.J. Anderson, ed., pp. 293-315, 1996. Pr Af 1k 1 [9] C. Neubauer, J. Herre, and K. Brandenburg, “Continuous ð Þ¼ Steganographic Data Transmission Using Uncompressed Audio,” f ÂÃPr SD sà mà Proc. Second Int’l Workshop Information Hiding (IH ’98), vol. 1525, ¼ ð Þ 6¼ pp. 208-217, 1998. Âà Pr m m : s m ;s R [10] J. Brassil, S.H. Low, N.F. Maxemchuk, and L. O’Gorman, à i 1...i=l ià “Electronic Marking and Identification Techniques to Discourage  "#9 6¼ 1 i ‘n L ð Þ 2  ^ þ ÀÁ Document Copying,” IEEE J. Selected Areas in Comm., vol. 13, no. 8, n pp. 1495-1504, 1995. l n j k n j ‘ k 1 ‘j 2 ð À Þ2À ð À Þ 2À  [11] G.J. Simmons, “The Prisoners’ Problem and the Subliminal  j 0  j 0 Channel,” Proc. Int’l Cryptology Conf. (CRYPTO ’83), pp. 51-67, X¼ X¼ 1983. k 1 k 2À 2À 1 1= : [12] R.J. Anderson, “Stretching the Limits of Steganography,” Proc. ¼ 1 ‘  ð þ Þ First Int’l Workshop Information Hiding (IH ’96), pp. 39-48, 1996. À [13] R.J. Anderson and F.A. Petitcolas, “On the Limits of Stegano- On the other hand, when f F , Af outputs 1 exactly K graphy,” IEEE J. Selected Areas in Comm., vol. 16, no. 4, pp. 474-481, when W succeeds against RMBit, by the definition of May 1998. RMBit. Thus, we see that [14] C. Cachin, “An Information-Theoretic Model for Steganography,” Proc. Second Int’l Workshop Information Hiding (IH ’98), pp. 306-318, prf F k f k 1998. Adv k Pr A K 1 1 Pr A 1 1 [15] A;F ð Þ¼ ð Þ¼ À ð Þ¼ T. Mittelholzer, “An Information-Theoretic Approach to Stegano- R f k graphy and Watermarking,” Proc. Third Int’l Workshop Information SuccÂÃk Pr AÂÃ1 1 ¼ W;RMBitð ÞÀ ð Þ¼ Hiding (IH ’99), A. Pfitzmann, ed., pp. 1-16, 1999. R k [16] J.A. O’Sullivan, P. Moulin, and J.M. Ettinger, “Information Succ l 1 ÂÃ1= 2À Pr TB :  W;RMBitð Þ À ð þ Þ À ½ Š Theoretic Analysis of Steganography,” Proc. IEEE Int’l Symp. Information Theory (ISIT ’98), p. 297, Aug. 1998. It remains to show that Pr TB e=4 n. Notice that the ½ Šð Þ [17] J. Zo¨llner, H. Federrath, H. Klimant, A. Pfitzmann, R. Piotraschke, expected number of queries to f by A is just the number of A. Westfeld, G. Wicke, and G. Wolf, “Modeling the Security of Steganographic Systems,” Proc. Second Int’l Workshop Information messages that match a j‘-document prefix of sÃ, for j‘ Hiding (IH ’98), pp. 344-354, 1998. 1 j n, times k. Let Xm 1 if m 0; 1 matches a j- [18] C. Cachin, “An Information-Theoretic Model for Steganography,”   ¼ n 2f g block prefix of sÃ. Let X j 1 m 0;1 j‘ Xm denote the Information and Computation, vol. 192, no. 1, pp. 41-56, 2004. ¼ ¼ 2f g P P HOPPER ET AL.: PROVABLY SECURE STEGANOGRAPHY 15

[19] N.J. Hopper, J. Langford, and L. von Ahn, “Provably Secure Nicholas Hopper received the BA degree with Steganography,” Proc. 22nd Ann. Int’l Cryptology Conf. majors in mathematics and computer science (CRYPTO ’02), M. Yung, ed., pp. 77-92, 2002. from the University of Minnesota, Morris, in 1999 [20] N. Hopper, “Toward a Theory of Steganography,” PhD disserta- and the PhD degree in computer science from tion, Technical Report CMU-CS-04-157, Carnegie Mellon Univ., Carnegie Mellon University in 2004. He is Pittsburgh, Aug. 2004. currently a McKnight Land-Grant Professor in [21] L. Reyzin and S. Russell, “Simple Stateless Steganography,” the Department of Computer Science and En- Cryptology ePrint Archive, Report 2003/093, http://eprint.iacr. gineering, University of Minnesota, Minneapolis. org/, 2003. He received the US National Science Foundation [22] N. Dedic, G. Itkis, L. Reyzin, and S. Russell, “Upper and Lower CAREER award in 2006 and the Institute of Bounds on Black-Box Steganography,” Proc. Second Theory of Technology Student Board “Professor of the Year” award in 2007. His Cryptography Conf. (TCC ’05), J. Kilian, ed., pp. 227-244, 2005. research interests include cryptography, anonymity, and distributed [23] T.V. Le and K. Kurosawa, “Efficient Public Key Steganography systems security. Secure against Adaptively Chosen Stegotext Attacks,” Cryptology ePrint Archive, Report 2003/244, http://eprint.iacr.org/, 2003. Luis von Ahn in an assistant professor in the [24] A. Lysyanskaya and M. Meyerovich, “Provably Secure Stegano- Computer Science Department, Carnegie graphy with Imperfect Sampling,” Proc. Ninth Int’l Conf. Theory Mellon University. He is the recipient of a and Practice of Public-Key Cryptography (PKC ’06), M. Yung, MacArthur “Genius” Fellowship, and he was Y. Dodis, A. Kiayias, and T. Malkin, eds., pp. 123-139, 2006. named one of Popular Science Magazine’s most [25] M. Backes and C. Cachin, “Public-Key Steganography with Active “Brilliant 10” scientists of 2006, Technology Attacks,” Proc. Second Theory of Cryptography Conf. (TCC ’05), Review’s 35 Top Innovators under 35 in 2007, J. Kilian, ed., pp. 210-226, 2005. and Smithsonian Magazine’s Top Innovators in [26] O. Goldreich, S. Goldwasser, and S. Micali, “How to Construct the Arts and Scientists. He has also been named Random Functions,” J. ACM, vol. 33, no. 4, pp. 792-807, 1986. one of the 50 most influential people in [27] J. Ha˚stad, R. Impagliazzo, L.A. Levin, and M. Luby, “A technology by silicon.com and is the recipient of a Microsoft New Pseudorandom Generator from Any One-Way Function,” SIAM Faculty Fellowship. His research interests include encouraging people to J. Computing, vol. 28, no. 4, pp. 1364-1396, 1999. do work for free, as well as catching and thwarting cheaters in online [28] J. Wolfowitz, Coding Theorems of Information Theory. Springer and environments. Prentice Hall, 1978. [29] J. Kilian, ed., Proc. Second Theory of Cryptography Conf. (TCC), 2005. John Langford received a physics/computer science double major from the California Insti- tute of Technology (Caltech) in 1997 and the PhD degree in computer science from Carnegie Mellon University in 2002. He is a senior researcher with the Machine Learning Group, Yahoo! Research, New York. His work includes research in machine learning, game theory, steganography, and Captchas. He was pre- viously at the Toyota Technological Institute, Chicago, and worked in the past at IBM’s T.J. Watson Research Center, Yorktown, New York, under the Goldstine Fellowship.

. For more information on this or any other computing topic, please visit our Digital Library at www.computer.org/publications/dlib.