Provably Secure Steganography
Total Page:16
File Type:pdf, Size:1020Kb
IEEE TRANSACTIONS ON COMPUTERS, VOL. 58, NO. X, XXX 2009 1 Provably Secure Steganography Nicholas Hopper, Luis von Ahn, and John Langford Abstract—Steganography is the problem of hiding secret messages in “innocent-looking” public communication so that the presence of the secret messages cannot be detected. This paper introduces a cryptographic formalization of steganographic security in terms of computational indistinguishability from a channel, an indexed family of probability distributions on cover messages. We use cryptographic and complexity-theoretic proof techniques to show that the existence of one-way functions and the ability to sample from the channel are necessary conditions for secure steganography. We then construct a steganographic protocol, based on rejection sampling from the channel, that is provably secure and has nearly optimal bandwidth under these conditions. This is the first known example of a general provably secure steganographic protocol. We also give the first formalization of “robust” steganography, where an adversary attempts to remove any hidden messages without unduly disrupting the cover channel. We give a necessary condition on the amount of disruption the adversary is allowed in terms of a worst case measure of mutual information. We give a construction that is provably secure and computationally efficient and has nearly optimal bandwidth, assuming repeatable access to the channel distribution. Index Terms—Steganography, covert channels, provable security. Ç 1INTRODUCTION classic problem of computer security is the mitigation Steganographic “protocols” have a long and intriguing Aof covert channels. First introduced by Lampson [1], a history that predates covert channels, stretching back to covert channel in a (single-host or distributed) computer antiquity. For example, Kahn [3] relates stories of World system can be roughly defined as any means by which two War II prisoners, spies, and soldiers sending secret processes or users can exchange information in violation of messages—written in invisible ink, hidden in love letters security policy. While the exact detection of usable covert (e.g., hiding the message in the first character of each channels in a system is undecidable, many conservative sentence), or using other “physical” steganographic approaches exist to detect and eliminate all potential covert techniques—to circumvent inspections by both the Allied channels; for example, the US Department of Defense and Axis governments. In response, postal censors crossed “Light Pink Book” [2] on covert channel analysis includes out anything that looked like sensitive information (e.g., detailed procedures to find and eliminate covert channels. long strings of digits) and they prosecuted individuals Unfortunately, the cost of such elimination is often whose mail seemed suspicious. In many cases, censors prohibitive; in this case, the Light Pink Book recommends even randomly deleted innocent-looking sentences or techniques to limit the bandwidth of covert channels and entire paragraphs in order to prevent secret messages requires auditing to detect any use of the covert channel. A from being delivered. natural question that arises from this suggestion is whether More recently, there has been a great deal of interest in it is feasible for an auditor to do so. digital steganography, that is, in hiding secret messages in This paper focuses on the dual problem of steganogra- communications between computers. This interest is ap- phy: How can two communicating entities send secret parently fueled by the increased amount of communication messages over a public or audited channel so that a third that is mediated by computers and by connections to party such as the reference monitor cannot detect the potential commercial applications: Hidden information presence of the secret messages? Notice how the goal of could potentially be used to detect or limit the unauthorized steganography is different from classical encryption, which propagation of the innocent-looking “carrier” data. Because seeks to conceal the content of secret messages: Stegano- of this, there have been numerous proposals for protocols to graphy is about hiding the very existence of the secret hide data in channels containing pictures [4], [5], video [5], messages. [6], [7], audio [8], [9], and even typeset text [10]. Many of these protocols are extremely clever and rely heavily on . N. Hopper is with the Department of Computer Science and Engineering, domain-specific properties of these channels. On the other University of Minnesota, Minneapolis, MN 55455. hand, the literature on steganography also contains many E-mail: [email protected]. clever attacks that detect the use of such protocols. As a . L. von Ahn is with the Computer Science Department, Carnegie Mellon result, it is unclear from this body of work whether secure University, Pittsburgh, PA 15213. E-mail: [email protected]. J. Langford is with the Machine Learning Group, Yahoo! Research, New steganography is possible at all. York, NY 10018. E-mail: [email protected]. In this paper, we use techniques from cryptography and Manuscript received 26 Apr. 2007; revised 18 July 2008; accepted 15 Sept. complexity theory to answer the question “under what 2008; published online 23 Oct. 2008. conditions is (secure) steganography possible?” We give Recommended for acceptance by F. Lombardi. cryptographic definitions for symmetric-key stegosystems For information on obtaining reprints of this article, please send e-mail to: [email protected], and reference IEEECS Log Number TC-2007-04-0146. and steganographic secrecy against a passive adversary in Digital Object Identifier no. 10.1109/TC.2008.199. terms of indistinguishability from a probabilistic channel 0018-9340/09/$25.00 ß 2009 IEEE Published by the IEEE Computer Society 2 IEEE TRANSACTIONS ON COMPUTERS, VOL. 58, NO. X, XXX 2009 process. We show that a widely believed complexity- bound is weaker than ours (in terms of bits per oracle query) theoretic assumption (the existence of a one-way function) by a factor of log2 2e but applies to a more general class of and access to a channel oracle are both necessary and stegosystems and is proved using techniques of independent sufficient conditions for the existence of secure stegano- interest. Van Le and Kurosawa [23] proposed a stegosystem graphy relative to any channel. We furthermore give a based on arithmetic coding that uses specialized knowledge construction that has essentially optimal bandwidth when of a channel to achieve higher bandwidth per symbol in compared with known provably secure constructions. some cases. Lysyanskaya and Meyerovich [24] considered Finally, we consider the question of robust steganography the effects of an imperfect channel sampling procedure on that resists attempts to censor the use of a covert channel; universal stegosystems. Backes and Cachin [25] introduced a we prove necessary and sufficient conditions for the notion of “active attacks” that use a decoding oracle to detect existence of a secure robust stegosystem and give a the presence of hidden messages in a special “challenge” provably robust stegosystem with nearly optimal band- cover message. width under attack. 1.2 Preliminaries 1.1 Previous Work We denote the length of a string or sequence s by s . We j j The scientific study of steganography in the open literature denote the empty string or sequence by " and the began in 1983 when Simmons [11] stated the problem in terms concatenation of strings s and s by s s . We assume the 1 2 1k 2 of communication in a prison. In his formulation, two use of efficient and unambiguous pairing and unpairing inmates, Alice and Bob, are trying to hatch an escape plan. operators on strings so that s1;s2 may be uniquely The only way they can communicate with each other is ð Þ interpreted as the pairing of s1 with s2 and may not be through a public channel, which is carefully monitored by the the same as s s . One example of such an operation is to 1k 2 warden of the prison, Ward. If Ward detects any encrypted encode s ;s by a prefix-free encoding of s , followed by ð 1 2Þ j 1j messages or codes, he will throw both Alice and Bob into s , followed by a prefix-free encoding of s , and then s . 1 j 2j 2 solitary confinement. The problem of steganography is then: Unpairing works in the obvious way. how can Alice and Bob cook up an escape plan by We let U denote the uniform distribution on 0; 1 k. If X k f g communicating over the public channel in such a way that is a finite set, we denote by x X the action of uniformly Ward does not suspect that anything “unusual” is going on? choosing x from X. We denote by L;l the uniform Anderson and Petitcolas [12], [13] posed many of the L F l distribution on functions f : 0; 1 0; 1 . For a prob- open problems resolved in this article. In particular, they f g !f g ability distribution D, we denote the support of D by D . pointed out that it was unclear how to prove the security of ½ For an integer n, we let n denote the set 1; 2; ...;n . a steganographic protocol and gave an example that is ½ f g similar to the protocol we present in Section 4 but noted that We make use of two different measures of the informa- it was not clear what properties were necessary to prove its tion in a probability distribution. Let be a probability D security. They also posed the open question of bounding the distribution with finite support D; then, the Shannon bandwidth that can be securely achieved over a given cover Entropy of is HS d D Pr d log2 Pr d , and the channel. D ðDÞ ¼ À 2 D½ D½ Minimum Entropy of is H mind D log2 Pr d . D P1ðDÞ ¼ 2 fÀ D½ g Since then, several works [14], [15], [16], [17] have We model all parties by Probabilistic Turing Machines addressed information-theoretic definitions of steganogra- (PTMs).