On Statistical Distance Based Testing of Pseudo Random Sequences and Experiments with PHP and * Debian Openssl
Total Page:16
File Type:pdf, Size:1020Kb
computers & security 53 (2015) 44e64 Available online at www.sciencedirect.com ScienceDirect journal homepage: www.elsevier.com/locate/cose On statistical distance based testing of pseudo random sequences and experiments with PHP and * Debian OpenSSL * Yongge Wang a, , Tony Nicol b a Dept. SIS, UNC Charlotte, Charlotte, NC 28223, USA b University of Liverpool, UK article info abstract Article history: NIST SP800-22 (2010) proposed the state of the art statistical testing techniques for testing Received 10 July 2014 the quality of (pseudo) random generators. However, it is easy to construct natural func- Received in revised form tions that are considered as GOOD pseudorandom generators by the NIST SP800-22 test 10 April 2015 suite though the output of these functions is easily distinguishable from the uniform Accepted 21 May 2015 distribution. This paper proposes solutions to address this challenge by using statistical Available online 2 June 2015 distance based testing techniques. We carried out both NIST tests and LIL based tests on commonly deployed pseudorandom generators such as the standard C linear congruential Keywords: generator, Mersenne Twister pseudorandom generator, and Debian Linux (CVE-2008-0166) Statistical testing pseudorandom generator with OpenSSL 0.9.8c-1. Based on experimental results, we illus- Pseudorandom generators trate the advantages of our LIL based testing over NIST testing. It is known that Debian The law of the iterated logarithm Linux (CVE-2008-0166) pseudorandom generator based on OpenSSL 0.9.8c-1 is flawed and Web security the output sequences are predictable. Our LIL tests on these sequences discovered the NIST randomness testing flaws in Debian Linux implementation. However, NIST SP800-22 test suite is not able to Statistical distance detect this flaw using the NIST recommended parameters. It is concluded that NIST SP800- OpenSSL 22 test suite is not sufficient and distance based LIL test techniques be included in statis- PHP random generator tical testing practice. It is also recommended that all pseudorandom generator imple- mentations be comprehensively tested using state-of-the-art statistically robust testing tools. © 2015 Elsevier Ltd. All rights reserved. random number generator in Debian's OpenSSL release CVE- 1. Introduction 2008-0166 is predictable. The weakness in Debian pseudo- random generator affected the security of OpenSSH, Apache The weakness in pseudorandom generators could be used to (mod_sl), the onion router (TOR), OpenVPN, and other appli- mount a variety of attacks on Internet security. It is reported cations (see, e.g., (Ahmad, 2008)). These examples show that it in the Debian Security Advisory DSA-1571-1 [5] that the is important to improve the quality of pseudorandom * An extended abstract of this paper appeared in ESORICS 2014. * Corresponding author. E-mail addresses: [email protected] (Y. Wang), [email protected] (T. Nicol). URL: http://www.sis.uncc.edu/~yonwang/ http://dx.doi.org/10.1016/j.cose.2015.05.005 0167-4048/© 2015 Elsevier Ltd. All rights reserved. computers & security 53 (2015) 44e64 45 * generators by designing systematic testing techniques to respectively. S ¼ {0,1} is the binary alphabet, S is the set of discover these weak implementations in the early stage of (finite) binary strings, Sn is the set of binary strings of length n, ∞ system development. and S is the set of infinite binary sequences. The length of a * Statistical tests are commonly used as a first step in string x is denoted by jxj. For strings x,y2S , xy is the determining whether or not a generator produces high quality concatenation of x and y, x8y denotes that x is an initial * ∞ random bits. For example, NIST SP800-22 Revision 1A (Rukhin segment of y. For a sequence x2S ∪S and a natural number et al., 2010) proposed the state of art statistical testing tech- n 0, x^n ¼ x½0::n À 1 denotes the initial segment of length n niques for determining whether a random or pseudorandom of x (x^n ¼ x½0::n À 1¼x if jxj n) while x[n] denotes the nth bit generator is suitable for a particular cryptographic applica- of x, i.e., x[0..nÀ1]¼x[0]…x[nÀ1]. tion. In a statistical test of Rukhin et al. (2010), a significance The concept of “effective similarity” by Goldwasser and level a2[0.001,0.01] is chosen for each test. For each input Micali (Goldwasser and Micali, 1984) and Yao (Yao, 1982)is sequence, a P-value is calculated and the input string is defined as follows: Let X ¼ {Xn}n2N and Y ¼ {Yn}n2N be two accepted as pseudorandom if P-value a. A pseudorandom probability ensembles such that each of Xn and Yn is a distri- generator is considered good if, with probability a, the se- bution over Sn. We say that X and Y are computationally (or quences produced by the generator fail the test. For an in- statistically) indistinguishable if for every feasible algorithm A depth analysis, NIST SP800-22 recommends additional sta- (or every algorithm A), the difference dAðnÞ¼jProb½AðXnÞ¼1À tistical procedures such as the examination of P-value distri- Prob½AðYnÞ¼1j is a negligible function in n. butions (e.g., using c2-test). In Section 3, we will show that Let l:N / N with l(n) n for all n2N and G be a polynomial- NIST SP800-22 test suite has inherent limitations with time computable algorithm such that jGðxÞj ¼ lðjxjÞ for all * straightforward Type II errors. Furthermore, our extensive x2S . Then G is a polynomial-time pseudorandom generator experiments (based on over 200 TB of random bits generated) if the ensembles {G(Un)}n2N and {Ul(n)}n2N are computationally show that NIST SP800-22 techniques could not detect the indistinguishable. weakness in the above mentioned pseudorandom generators. In order to address the challenges faced by NIST SP800-22, this paper designs a “behavioristic” testing approach which is 3. Limitations of NIST SP800-22 based on statistical distances. Based on this approach, the details of LIL testing techniques are developed. As an example, In this section, we show that NIST SP800-22 test suite has we carried out LIL testing on the flawed Debian Linux (CVE- inherent limitations with straightforward Type II errors. Our 2008-0166) pseudorandom generator based on OpenSSL first example is based on the following observation: for a “ ” 0.9.8c-1 and on the standard C linear congruential generator. function F that mainly outputs random strings but, with a As we expected, both of these pseudorandom generators probability , outputs biased strings (e.g., strings consisting ' “ ” failed the LIL testing since we know that the sequences pro- mainly of 0 s), F will be considered as a good pseudo- duced by these two generators are strongly predictable. random generator by NIST SP800-22 test though the output However, as we have mentioned earlier, our experiments of F could be distinguished from the uniform distribution show that the sequences produced by these two generators (thus, F is not a pseudorandom generator by definition). Let pass the NIST SP800-22 test suite using the recommended RANDc,n be the sets of Kolmogorov c-random binary strings parameters. In other words, NIST SP800-22 test suite with the of length n,wherec 1. That is, for a universal Turing recommended parameters has no capability in detecting machine M,let these known deviations from randomness. Furthermore, it is n RANDc;n ¼fx2f0; 1g : if MðyÞ ¼ x thenjyjjxjcg: (1) shown that for several pseudorandom generators (e.g., the linear congruential generator), the LIL test results on output Let a be a given significance level of NIST SP800-22 test and ℛ ℛn∪ℛn strings start off fine but deteriorate as the string length in- 2n ¼ 1 2 where creases beyond that which NIST can handle since NIST testing ℛn3 ℛn n a 1 RAND2;2n and 1 ¼ 2 ð 1 À Þ tool package has an integer overflow issue. n ℛn3f0nx : x2f0; 1g g and ℛn ¼ 2na: The paper is organized as follows. Section 2 introduces 2 2 n notations. Section 3 points out the limitation of NIST SP800-22 Furthermore, let fn : f0; 1g /ℛ2n be an ensemble of testing tools. Section 4 discusses the law of iterated logarithm random functions (not necessarily computable) such that f(x) (LIL). Section 5 reviews the normal approximation to binomial is chosen uniformly at random from ℛ2n. Then for each n-bit distributions. Section 6 introduces statistical distance based string x, with probability 1Àa, fn(x) is Kolmogorov 2-random a 2ℛn LIL tests. Section 7 reports distance based LIL testing experi- and with probability , fnðxÞ 2. mental results, and Section 8 reports some experimental re- It should be noted that, for each given significance level sults on LIL curves of commonly used random generators. a 0.1 and each test T from the 15 NIST SP800-22 tests, the Section 9 contains general discussions on OpenSSL random following condition is satisfied: generators. 2n jfx2f0; 1gn : x fails test T at significance level agj cT 2. Notations and pseudorandom generators for a constant cT[10. Thus one can construct a Turing ma- chine Ma,T with the following properties: for each string x of In this paper, N and Rþ denotes the set of natural numbers length n that fails the test T at the significance level a, there (starting from 0) and the set of non-negative real numbers, exists another string y such that Ma,T(y)¼x and jyj n À 3. In 46 computers & security 53 (2015) 44e64 other words, we can compress x for at least 3 bits. It follows accurate in deviation detection and avoids above type II errors that all Kolmogorov 2-random strings are guaranteed to pass in NIST SP800-22.