Nonesuch: a Mix Network with Sender Unobservability
Thomas S. Andrei Serjantov Benessa Defend Heydt-Benjamin The Freehaven Project University of Massachusetts at University of Massachusetts at [email protected] Amherst Amherst [email protected] [email protected]
ABSTRACT common among such anonymity systems that certain kinds Oblivious submission to anonymity systems is a process by of adversaries can easily determine the identity of all enti- which a message may be submitted in such a way that nei- ties submitting messages to the network. In Mixminion [5] ther the anonymity network nor a global passive adversary and other mix-based designs, for example, a sender can be may determine that a valid message has been sent. We identified with certainty by a global passive adversary or present Nonesuch: a mix network with steganographic sub- even a single corrupt entry node [17]. In other words, many mission and probabilistic identification and attenuation of systems (with the notable exception of DC-nets) provide un- cover traffic. In our system messages are submitted as ste- linkability rather than sender unobservability and hence the gotext hidden inside Usenet postings. The steganographic identities of all the senders are easily known. extraction mechanism is such that the the vast majority of There are many situations, however, in which mere knowl- the Usenet postings which do not contain keyed stegotext edge of submission is too much knowledge to permit the will produce meaningless output which serves as cover traf- adversary to achieve. Even if the contents and the recipi- fic, thus increasing the anonymity of the real messages. This ent of a message are occluded by the anonymity network, a cover traffic is subject to probabilistic attenuation in which sender may wish to keep secret the very fact of participa- nodes have only a small probability of distinguishing cover tion in an anonymity protocol. Users who need this level messages from “real” messages. This attenuation prevents of privacy protection include, for example, citizens of op- cover traffic from travelling through the network in an infi- pressive governments with widespread surveillance, and cor- nite loop, while making it infeasible for an entrance node to porate whistle blowers. In addition to possible threats of distinguish senders. negative consequences for participating in anonymous com- munication, these users may be prevented from participation by an adversarial service provider. Categories and Subject Descriptors We propose Nonesuch: a high latency mix network which D.2.11 [Software]: Software Architectures, Information Hid- supports oblivious submission. Nonesuch provides the same ing; C.2.1 [Computer Systems Organization]: ,Computer- sender-receiver unlinkability provided by other mix networks, Communication Networks, Network Architecture and De- while providing better sender anonymity and strong pro- sign; E.3 [Data Encryption]: ,Public Key Cryptosystems tection against tagging attacks. Nonesuch offers improved sender anonymity against both compromised nodes and against General Terms passive adversaries. In Nonesuch, users steganographically embed messages in Algorithms, Security images which they then post to the most popular Usenet newsgroups. Note that by the nature of popularity, neither Keywords subscribing to nor posting to these newsgroups is a suspi- cious activity. The majority of images in Usenet will not Mix Networks, Sender Unobservability, Public Key, Steganog- contain stegotext and will serve as cover traffic. Nonesuch raphy, Oblivious Channels, Minx Packet Format nodes operate by performing a steganographic extraction on each new posting to the protocol-specified range of Usenet 1. INTRODUCTION newsgroups. These messages are then routed based on a Many different kinds of anonymity networks exist in which tunably sparse routing table. This routing mechanism per- it is difficult to link senders of messages to recipients. It is mits correct routing of messages while only slightly reducing the anonymity set of a valid message from the set of all valid messages and cover traffic combined. Cover traffic is prob- abilistically revealed as such and is attenuated out of the Permission to make digital or hard copies of all or part of this work for network as soon as it is detected. personal or classroom use is granted without fee provided that copies are Anonymity of messages and quantity of cover traffic are not made or distributed for profit or commercial advantage and that copies inversely proportional. As a message travels deeper in the bear this notice and the full citation on the first page. To copy otherwise, to anonymity network it becomes more difficult to link it to its republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. entrance and thus its sender. In Nonesuch, cover traffic is WPES’06, October 30, 2006, Alexandria, Virginia, USA. attenuated as it travels deeper. Therefore we concentrate Copyright 2006 ACM 1-59593-556-8/06/0010 ...$5.00. Figure 1: In the Nonesuch protocol submission is accomplished over a subliminal channel, and anonymity nodes can only distinguish real messages from cover traffic with small probability. cover traffic where it is most needed; to occlude submission. difficult for an adversary to distinguish senders from non- Cover traffic then becomes sparse where it is not needed; at sending participants. An important difference between Bauer’s the end of a real message’s path. system and ours is that his system requires the participa- In this paper we: tion of the servers belonging to popular websites, whereas our protocol uses only existing features of Usenet. We ob- 1. Present a novel design for oblivious submission serve that it is much easier for a user to submit a posting to Usenet than to submit software to a major website and ask 2. Utilize the Minx packet format in what we believe is a them to run it on their servers. novel way The Minx packet format uses probabilistic attenuation in order to gain resistance to active tagging attacks [6]. While 3. Present a Baysean inference based attack against un- this goal is quite different from our central one of oblivious certainty about the number of real and dummy mes- submission we find the Minx packet format compatible with sages in the mix our system as discussed in section 3.3. 4. Demonstrate that this attack does not alter the advan- All mixes are vulnerable to timing attacks [13]. However, tage of Nonesuch over more traditional Mix networks Since all non-Nonesuch postings to the most popular Usenet image newsgroups serve as cover traffic in our system, the volume of cover traffic in Nonesuch is both high and difficult 2. RELATED WORK for an adversary to control. Our system should therefore be Chaum [4] introduced the concept of anonymizing mix more resilient than mix-based systems which have less cover networks. ore modern systems, such as Mixminion [5] al- traffic, or which rely on artificial (and therefore more easily low features such secure anonymous replies, resistance to manipulated) dummy traffic for cover. tagging, and more security against active attacks. More ad- DC-nets [3] provide strong sender anonymity, in which vanced analysis techniques allow us to prove [2] or at least a passive adversary is not aware of when an originator is to some extent analyze mix networks [15, 7]. sending a message. However, the adversary can still deter- Nevertheless, mix net protocols offer strong sender-receiver mine that the sender is a member of an anonymity network. unlinkability, especially when few messages are exchanged. Disclosure of this knowledge is unacceptable in situations However, none of these systems conceal the fact that an where entities such as governments or companies may pro- originator has submitted a message to the mix net when the hibit involvement in anonymity networks. Nonesuch pro- originator’s outgoing traffic is monitored by a passive adver- vides strong sender anonymity without requiring the sender sary. Nonesuch offers secret submission to a mix net using to join an anonymity network. Our system does not, how- a steganographic covert channel. ever, provide recipient anonymity which is a property of Our work is inspired in part by Matthias Bauer’s work on DC-nets. limited unobservability in the context of a mix network [1]. This system uses covert channels in HTTP to make it more first b bits of file difficult problem, and is not the focus of this work. We note
0101001010011101010010 that the global parameter distribution problem for Nonesuch 101001011011101001010} 1 Message 1001010100101011011101010011 Routing Table is almost exactly the same as that for Tor and Mixminion. A 0110100110110111101010110101 satisfactory solution for either of these systems will also suf- 1101101001010011110101010110 Node J 0101001101001001001001001011 fice for Nonesuch. At the moment both Tor and Mixminion 0110101001001010101100101010 Node A 0010100101010101101001001001 rely on trusted directory servers [8, 5]. For simplicity of dis- 1001110110100101010001001011 1101100001011010110110101010 Node Q cussion, let us assume the existence of a trusted certificate 0101001010011101010010 Node T authority, which we will call CA. In implementation, CA 1010010110111010010101 h( ) Node Y can be replaced with any suitable directory server scheme. Nodes wishing to participate in the protocol contact CA Node E before the beginning of a new time epoch. The nodes must 1101001010010110101001 1111101010010100101001
messages in network z=100 0.16 Unconditional distribution N – total number of packets in the mix z=20 r – number of “good” (anonymous) messages in the mix 0.14 N r − – number of dummies 0.12 z – number of messages leaving the mix
R – fraction of good entries in the routing table 0.1 We start by looking at the distribution of the number of real messages given that z messages were observed leaving Probability 0.08 the mix. 0.06 P (z|N,r) = 0 if z 0.04 N − r z−r N−z = R (1 − R) if z ≥ r z − r ¡ 0.02 r Hence messages are good, and leave the local mix node, 0 z−r messages are dummies but get through with probability 0 10 20 30 40 50 60 70 80 90 100 z r Number of Real Messages R − and N − z messages are dummies and get attenuated. Of course what we are really interested in is the probabil- Figure 3: Probability distributions of real messages: ity distribution of real messages given that z messages were unconditional; after observing 20 messages exiting observed leaving the mix, i.e. P (r|z, N). the mix; after observing 100 messages exiting the Using Bayes theorem, mix P (z|r, N)P (r|N) P (r|z, N)= P (z|N) tainty about this, even in the case where all the messages are real (see for example Figure 3, case for z = 100). More And P (z|N)= ¢ r∈N P (z|r, N)P (r|N). We can approxi- r concretely, we have shown that unlike in more traditional mate P (r|N) by a binomial distribution: P (r|N)= Nf (1− N−r mix systems, the entry mix or a passive attacker cannot de- f) . termine the number of real messages entering the mix and Let us take an example. The attacker is observing a mix hence their senders. node and sees 100 messages enter it. He does not know how A more powerful adversary can mount a different attack many of these are “real” (r), but he does have an idea that on the extra property provided by Nonesuch. If the at- around 50% of the traffic is comprised of real messages (this tacker owns one or more Nonesuch nodes, then he can mon- could be obtained from observing the input and the output itor many messages which are sent by Alice and observe the of the mixes over a long period of time). For example, let probability with which they are retained by the mix. If this us consider an attenuation coefficient of R = 0.85 (the value is significantly higher than R, the attacker may conclude of R should be tuned to desired security versus efficiency that A is, in fact a sender of Nonesuch messages. This is, of tradeoff, we choose 0.85 for the sake of exposition as it offers course, merely an instance of a long term intersection attack a reasonable compromise between these). Suppose then the on anonymity properties which have been paid significant attacker observes that all 100 submitted packets leave the attention in the literature, e.g. [16, 12]. mix as real messages. This leads the adversary to suspect that more messages are real. If only 20 messages come out, his estimate of the number of real messages decreases. This 5. EFFICIENCY is illustrated in Figure 3. The Nonesuch network derives much of its security from Note that this inference can be made simply by observing the large amount of cover traffic that circulates. The cover one mix node. Naturally, if all the messages from the first traffic retention ratio R (the probability that a cover mes- node go to the second node, the inference can be repeated sage is retained by any given node) is one of the most critical to obtain a more accurate estimate for the number of good parameters to the protocol. When selecting an appropriate messages. If, however, messages from other mixes arrive to value for R there is a tension between desired security prop- the second mix, the inference we have used above would not erties and the possibility of overwhelming the network with be accurate as the attacker may have already inferred some- more messages than can be processed. In considering R val- thing about the likelihood of them being a real message. ues we have made several observations. An alternative inference procedure may be developed where A contributing factor to the amount of traffic in the None- the attacker computes an estimate of the probability of a such network is the rate at which new image files are posted packet being a real message for each packet and then com- to the set of monitored Usenet newsgroups. We determined bined these when the packets enter the mix. Such a method that the average posting rate to the top ten most popu- would be much more complex and possibly computationally lous Usenet newsgroups combined is 0.583 messages per sec- intensive; we leave this for future work. ond (daily observation of 10 most populous newsgroups vis- This example shows that while an attacker, such as an en- ible to the Easynews Usenet provider from 04/13/2005 to try node or a passive adversary around an entry node, can 05/18/2005). We explicitly ignore the bursts that are com- gain some knowledge about the number of real messages mon in such traffic since by the nature of mixing our system travelling through the node1, there is still significant uncer- group messages which enter the node, including the real 1and clearly the identities of the all the posters of the news- senders of anonymous messages will queue messages during overly busy periods to be for- 7. ACKNOWLEDGEMENTS warded during less busy periods. If over time the average We thank Boris Margolin for having asked us the ques- rate significantly increases or decreases, the protocol can re- tions that Nonesuch was eventually developed to answer. spond by choosing to monitor fewer or more newsgroups in His conversation was invaluable in motivating this work. We order to keep the input rate consistent. Such changes would thank George Danezis for co-inventing the Minx packet for- be communicated to clients and nodes along with the rest mat, and then travelling to the University of Massachusetts of the published global parameters. to share it with us. We thank Brian Levine for advising an The total number of packets in the system can be ex- earlier version of Nonesuch, and Kevin Fu for his support pressed as: and writing advice. Finally we thank WPES anonymous reviewers for their exceptionally thorough and helpful com- ∞ i ments. rlˆ + (Nˆ − rˆ)(R ) i=0 8. REFERENCES wherer ˆ is the total number of real messages, Nˆ is the total [1] Bauer, M. New covert channels in http: adding number of packets entering the system and R is the cover unwitting web browsers to anonymity sets. In WPES traffic retention ratio, and M is the number of nodes in the ’03: Proceedings of the 2003 ACM workshop on Nonesuch network. Given thatr ˆ is more or less fixed, R and Privacy in the electronic society (New York, NY, M are the factors that will determine the per node amount USA, 2003), ACM Press, pp. 72–78. of traffic in the Nonesuch network. Clearly, the amount of [2] Berman, R., Fiat, A., and Ta-Shma, A. Provable traffic in the system is easily tunable by altering R. unlinkability against traffic analysis. In Proceedings of In order to demonstrate the efficiency in a more concrete Financial Cryptography (FC ’04) (February 2004), manner, we make the following comparison to an existing A. Juels, Ed., Springer-Verlag, LNCS 3110. network: setting R = 0.5 attenuates the dummy traffic in [3] Chaum, D. The dining cryptographers problem: an average of two hops, which makes adding dummy traffic unconditional sender and recipient untraceability. J. much less expensive than increasing capacity of the network Cryptol. 1, 1 (1988), 65–75. for real traffic (typical route length of an anonymous email [4] Chaum, D. L. Untraceable electronic mail, return is between 2 and 5). Hence, if we take 50% of the traffic to addresses, and digital pseudonyms. Commun. ACM be real (with an average route length of 4), then we need 24, 2 (1981), 84–90. to increase the capacity of the network by 25% compared [5] Danezis, G., Dingledine, R., and Mathewson, N. to a Mixminion-like system without the extra sender un- Mixminion: Design of a Type III Anonymous traceability properties. This can easily be done by adding Remailer Protocol. In Proceedings of the 2003 IEEE extra nodes to the system (the typical number of Mixmaster Symposium on Security and Privacy (May 2003). or Mixminion nodes has been around 30 over the period of [6] Danezis, G., and Laurie, B. Minx: A simple and 2000-2005) [14]. efficient anonymous packet format. In Proceedings of Processor overhead is unlikely to be the limiting factor in the Workshop on Privacy in the Electronic Society the potential throughput of the Nonesuch system. We ran (WPES 2004) (Washington, DC, USA, October 2004). an OpenSSL benchmark on a Dell Precision 470 running [7] D´ıaz, C. Anonymity and Privacy in Electronic linux (2GHz microprocessor), which we take to be repre- Services. PhD thesis, Katholieke Universiteit Leuven, sentative of relatively ubiquitous consumer hardware. This Leuven, Belgium, December 2005. benchmark indicates that such a computer can process 4.434 [8] Dingledine, R., Mathewson, N., and Syverson, MB/s of 1024 bit RSA decryption (average of 10 runs). Rel- P. Tor: The second-generation onion router. In atively expensive asymmetric decryption must only be per- Proceedings of the 13th USENIX Security Symposium formed on the small (around 128 bits) header of a packet. (August 2004). The rest of the decryption is symmetric, and can most likely [9] Hopper, N. J., Langford, J., and von Ahn, L. be performed at line speed. Provably secure steganography. In CRYPTO ’02: It is far more likely that communications will be the lim- Proceedings of the 22nd Annual International iting factor to network size, especially in a hybrid network Cryptology Conference on Advances in Cryptology consisting of some corporate or university servers, and some (London, UK, 2002), Springer-Verlag, pp. 77–92. servers belonging to home users. [10] Horton, M., and Adams, R. Standard for interchange of USENET messages. RFC 1036, Dec. 1987. 6. CONCLUSION [11] Kantor, B., and Lapsley, P. Network News In this paper we considered the problem of making senders Transfer Protocol. RFC 977 (Proposed Standard), of anonymous messages indistinguishable from Usenet news- Feb. 1986. group users. We have presented a novel design of Nonesuch, [12] Kesdogan, D., Egner, J., and Buschkes,¨ R. an anonymity system with these desired properties. We have Stop-and-Go MIXes: Providing Probabilistic found that Nonesuch can make use of an existing packet for- Anonymity in an Open System. In Proceedings of mat, Minx, and draw on some of the ideas from [6]. We Information Hiding Workshop (IH 1998) (1998), argue that the additional traffic that passes through None- Springer-Verlag, LNCS 1525. such provides extra anonymity and outline one procedure [13] Levine, B. N., Reiter, M. K., Wang, C., and which the adversary can use to estimate the number of real Wright, M. K. Timing attacks in low-latency messages passing inside the mix. mix-based systems. In Proceedings of Financial Cryptography (FC ’04) (February 2004), A. Juels, Ed., Springer-Verlag, LNCS 3110. [14] Mixminion Stats V1.0. Statistics generated by the remailer grove. WWW, June 2006. http://privacy.outel.org/minion/nlist.html. [15] Serjantov, A. On the Anonymity of Anonymity Systems. PhD thesis, University of Cambridge, June 2004. [16] Serjantov, A., and Danezis, G. Towards an information theoretic metric for anonymity. In Proceedings of Privacy Enhancing Technologies Workshop (PET 2002) (April 2002), R. Dingledine and P. Syverson, Eds., Springer-Verlag, LNCS 2482. [17] Syverson, P., Tsudik, G., Reed, M., and Landwehr, C. Towards an Analysis of Onion Routing Security. In Proceedings of Designing Privacy Enhancing Technologies: Workshop on Design Issues in Anonymity and Unobservability (July 2000), H. Federrath, Ed., Springer-Verlag, LNCS 2009, pp. 96–114. [18] von Ahn, L., and Hopper, N. J. Public-key steganography. In EUROCRYPT (2004), C. Cachin and J. Camenisch, Eds., vol. 3027 of Lecture Notes in Computer Science, Springer, pp. 323–341.