New generic algorithms for hard knapsacks
Nick Howgrave-Graham1 and Antoine Joux2
1 35 Park St, Arlington, MA 02474 [email protected] 2 dga and Universit´ede Versailles Saint-Quentin-en-Yvelines uvsq prism, 45 avenue des Etats-Unis,´ f-78035, Versailles cedex, France [email protected]
Abstract. In this paper3, we study the complexity of solving hard knap- sack problems, i.e., knapsacks with a density close to 1 where lattice- based low density attacks are not an option. For such knapsacks, the cur- rent state-of-the-art is a 31-year old algorithm by Schroeppel and Shamir which is based on birthday paradox techniques and yields a running time of O˜(2n/2) for knapsacks of n elements and uses O˜(2n/4) storage. We propose here two new algorithms which improve on this bound, finally lowering the running time down to O˜(20.3113 n) for almost all knapsacks of density 1. We also demonstrate the practicality of these algorithms with an implementation.
1 Introduction
The 0–1 knapsack problem or subset sum problem is a famous NP-hard problem which has often been used in the construction of cryptosystems. An instance of this problem consists of a list of n positive integers (a1, a2, ··· , an) together with another positive integer S. Given an instance, there exist two forms of knapsack problems. The first form is the decision knapsack problem, where we need to decide whether S can be written as:
n X S = iai, i=1 with values of i in {0, 1}. The second form is the computational knapsack prob- lem where we need to recover a solution = (1, ··· , n) if at least one exists. The decision knapsack problem is NP-complete (see [7]). It is also well-known that given access to an oracle that solves the decision problem, the computational problem can be solved using n calls to this oracle. Indeed, assuming that the original knapsack admits a solution, we can easily obtain the value of n by asking the oracle whether the subknapsack (a1, a2, ··· , an−1) can sum to S. If so, there exists a solution with n = 0, otherwise, a solution necessarily has n = 1. Repeating this idea, we obtain the bits of one at a time.
3 Full version of [12]. Knapsack problems were introduced in cryptography by Merkle and Hell- man [18] in 1978. The basic idea behind the Merkle-Hellman public key cryp- tosystem is to hide an easy knapsack instance into a hard looking one. The scheme was broken by Shamir [23] using lattice reduction. After that, many other knapsack based cryptosystems were also broken using lattice reduction. In particular, the low-density attacks introduced by Lagarias and Odlyzko [15] and improved by Coster et al. [4] are a tool of choice for breaking many knapsack based cryptosystems. The density of a knapsack is defined as: n d = . log2(maxi ai) More recently, Impagliazzo and Naor [13] introduced cryptographic schemes which are as secure as the subset sum problem. They classify knapsack problems according to their density. On the one hand, when d < 1 a given sum S can usually be inverted in a unique manner and these knapsacks can be used for encryption. On the other hand, when d > 1, most sums have many preimages and the knapsack can be used for hashing purposes. However, for encryption, the density cannot be too low, since the Lagarias-Odlyzko low-density attack can solve random knapsack problems with density d < 0.64 given access to an oracle that solves the shortest vector problem (SVP) in lattices. Of course, since Ajtai showed in [1] that the SVP is NP-hard for randomized reduction, such an oracle is not available. However, in practice, low-density attacks have been shown to work very well when the SVP oracle is replaced by existing lattice re- duction algorithm such as LLL4 [16] or the BKZ algorithm of Schnorr [20]. The attack of [4] improves the low density condition to d < 0.94. For high density knapsacks, with d > 1 there is a variant of these lattice-based attacks presented in [14] that finds collisions in mildly exponential time O(2n/1000) using the same lattice reduction oracle. However, for knapsacks with density close to 1, there is no effective lattice- based approach to solve the knapsack problem. As a consequence, in this case, we informally speak of hard knapsacks. Note that, it is proved in [13, Propo- sition 1.2], that density 1 is indeed the hardest case. For hard knapsacks, the state-of-the-art algorithm is due to Schroeppel and Shamir [21, 22] and runs in time O(n · 2n/2) using O(n · 2n/4) bits of memory. This algorithm has the same running time as the basic birthday based algorithm on the knapsack problem introduced by Horowitz and Sahni [10], but much lower memory requirements. To simplify the notation of the complexities in the sequel, we extensively use the soft-Oh notation. Namely, O˜(g(n)) is used as a shorthand for O(g(n)·log(g(n))i), for any fixed value of i. With this notation, the algorithm of Schroeppel and Shamir runs in time O˜(2n/2) using O˜(2n/4) bits of memory. Since Wagner presented his generalized birthday algorithm in [25], it is well- known that when solving problems involving sums of elements from several lists, it is possible to obtain a much faster algorithm when a single solution out of many is sought. A similar idea was previously used by Camion and Patarin in [2] to
4 LLL stands for Lenstra-Lenstra-Lov´aszand BKZ for blockwise Korkine-Zolotarev attack the knapsack based hash function of [5]. In this paper, we introduce two new algorithms that improve upon the algorithm of Schroeppel and Shamir to solve knapsack problems. In some sense, our algorithms are a new development of the generalized birthday algorithm. The main difference is that, instead of looking for one solution among many, we look for one of the many possible representations of a given solution. The paper is organized as follows: In Section 2 we recall some background in- formation on knapsacks, in Section 3 we briefly recall the algorithm of Schroeppel– Shamir and introduce a useful practical variant of this algorithm, in Section 4 we present our improved algorithms and in Section 5 we describe practical imple- mentations on a knapsack with n = 96. Section 4 is divided into 3 subsections, in 4.1 we describe the basic idea that underlies our algorithm, in 4.2 we present a simple algorithm based on this idea and in 4.3 we give an improved recursive version of this algorithm. Finally, in Section 6 we present several extensions and some possible applications of our new algorithms.
2 Background on knapsacks
2.1 Modular knapsacks
We speak of a modular knapsack problem when we want to solve:
n X i ai ≡ S mod M, i=1 where the integer M is the modulus. Up to polynomial factors, solving modular knapsacks and knapsacks over the integers are equivalent. Any algorithm that realizes one task can be used to solve the other. In one direction, given a knapsack problem over the integers and an algorithm that solves any modular knapsack, it is clear that solving the Pn problem modulo M = max(S, i=1 ai) + 1 yields all integral solutions. In the other direction, assume that the modular knapsack (a1, ··· , an) with target sum S mod M is given by representatives ai of the classes of modular numbers in the range [0,M − 1]. In this case, it is clear that any sum of at most n such numbers is in the range [0, nM −1]. As a consequence, if S is also represented in the range [0,M − 1], it suffices to solve n knapsack problems over the integers with targets S, S + M,..., S + (n − 1)M.
2.2 Random knapsacks
Given two parameters n and D, we define a random knapsack with solution on n elements with prescribed density D as a knapsack randomly constructed using the following process:
– Let B(n, D) = b2n/Dc. – Choose each ai (for i from 1 to n) uniformly at random in [1,B(n, D)]. n Pn – Uniformly choose a random vector in {0, 1} and let S = i=1 i ai.
Note that the computed density d of such a random knapsack differs from the prescribed density. However, as n tends to infinity, the two become arbitrarily close with overwhelming probability. In [4], it is shown that there exists a lattice based algorithm that solves all but an exponentially small fraction of random knapsacks with solution, when the prescribed density satisfies D < 0.94.
2.3 Unbalanced knapsacks
The random knapsacks from above may have arbitrary values in [0, n] for the Pn weight i=1 i of the solution. Yet, most of the time, we expect a weight close to n/2. For various reasons, it is also useful to consider knapsacks with different weights. We define an α-unbalanced random knapsack with solution on n elements given α and the density D as follows:
– Let B(n, D) = b2n/Dc.
– Choose each ai (for i from 1 to n) uniformly at random in [1,B(n, D)]. – Let ` = bαnc and uniformly choose a random vector with exactly ` co- n ordinates equal to 1, the rest being 0s, in the set of ` such vectors. Let Pn S = i=1 i ai.
Unbalanced knapsacks are natural to consider, since they already appear in the lattice based algorithms of [15, 4], where the value of α greatly impacts the densities that can be attacked. Moreover, in our algorithms, even when initially solving regular knapsacks, unbalanced knapsacks may appear in the course of the computations. When dealing with balanced knapsacks with exactly half zeros and ones, we also use the above definition and speak of 1/2-unbalanced knapsacks.
2.4 Complementary knapsacks
Given a knapsack a1,..., an with target sum S, we define its complementary knapsack to be the knapsack that contains the same elements and has target sum Pn 0 i=1 ai −S. The solution of the original knapsack and of the complementary knapsacks are related by:
0 For all i: i + i = 1.
Thus, solving either of the two knapsacks also yields the result of the other knapsack. Moreover, the weight ` and `0 are related by ` + `0 = n. In particular, if a knapsack is α-unbalanced, its complementary knapsack is (1−α)-unbalanced. As a consequence, in any algorithm, we may assume without loss of generality that ` ≤ bn/2c (or that ` ≥ dn/2e). 2.5 Asymptotic values of binomials Where knapsacks are considered, binomial coefficients are frequently encoun- n tered, we recall that the binomial coefficient ` is the number of distinct choices of ` elements within a set of n elements. We have: n n! = . ` `! · (n − `)! We often need to obtain asymptotic approximation for binomials of the form n n α n (or bα nc ) for fixed values of α in ]0, 1[. This is easily done by using Stirling’s formula: √ nn n! = (1 + o(1)) 2πn . e Ignoring polynomial factors in n, we find: n 1 n = O˜ . α n αα · (1 − α)1−α Many of the algorithms presented in this paper involve complexities of the form O˜(2c n), where a constant c is obtained by taking the logarithm in basis 2 of numbers coming from asymptotic estimates of binomials. In this case, to improve the readability of the complexity, we choose a decimal approximation c0 > c of c. This would allow us to rewrite the complexity as O(2c0 n) or even o(2c0 n). However, we prefer to stick to O˜(2c0 n). A typical example is the O˜(20.3113 n) ˜ n −n/2 time complexity of our fastest algorithm, which stands for O n/4 · 2 .
2.6 Distribution of random knapsack sums In order to analyze the behavior of our algorithms, we need to use information about the distribution of modular sums of the form: n X aixi (mod M), i=1 for a random knapsack modulo M and for n-tuples (x1, ··· , xn) ∈ B, where B is an arbitrary set of n-dimensional vectors, with coordinates modulo M. We use the following important theorem [19, Theorem 3.2]: n Theorem 1. For any set B ⊂ ZM , the identity: 2 1 X X 1 M − 1 P (B, c) − = M n a1,··· ,an M M|B| n c∈ (a1,··· ,an)∈ZM ZM Pn holds, where Pa1,··· ,an (B, c) denotes the probability that i=1 aixi ≡ c (mod M) for a random (x1, ··· , xn) drawn uniformly from B, i.e.:
( n ) 1 X P (B, c) = (x , ··· , x ) ∈ B such that a x ≡ c (mod M) . a1,··· ,an |B| 1 n i i i=1 This implies the immediate corollaries:
n Corollary 1. For any real λ > 0, the fraction of n-tuples (a1, ··· , an) ∈ ZM for which there exists a c ∈ ZM that satisfies |Pa1,··· ,an (B, c) − 1/M| ≥ λ/M is at most: M 2 . λ2 |B| Corollary 2. For any reals λ > 0 and 1 > µ > 0, the fraction of n-tuples n (a1, ··· , an) ∈ ZM for which there exist at least µ M values c ∈ ZM that satisfy
|Pa1,··· ,an (B, c) − 1/M| ≥ λ/M is at most: M . λ2 µ |B|
These two corollaries are used when |B| is larger than M. We also need two more corollaries, one for small values of |B| and one for |B| ≈ M: Corollary 3. For any reals 1 > µ > 0, if m > 1 denotes M/|B|, the fraction n of n-tuples (a1, ··· , an) ∈ ZM such that less than µ |B| values c ∈ ZM have Pa ,··· ,a (B, c) 6= 0 is at most: 1 n µ (1 − µ)m
n Corollary 4. For any reals λ > 0, the fraction of n-tuples (a1, ··· , an) ∈ ZM that satisfy: X M + |B| P (B, c)2 ≥ a1,··· ,an λM|B| c∈ZM is at most λ.
3 The algorithm of Schroeppel and Shamir
The algorithm of Schroeppel and Shamir was introduced in [21, 22]. It al- lows one to solve a generic integer knapsack problem on n-elements in time O˜(2n/2) using a memory of size O˜(2n/4). It improves on the birthday algorithm of Horowitz and Sahni [10] that can be applied on such a knapsack. We first re- call this basic birthday algorithm, which is based on the rewriting of a knapsack solution as an equality:
bn/2c n X X i ai = S − i ai, i=1 i=bn/2c+1 where all the s are 0 or 1. Thus, to solve the knapsack problem, we construct the set S(1) containing all possible sums of the first bn/2c elements and S(2) be the set obtained by subtracting from the target S any of the possible sums of the last dn/2e elements. Searching for collisions between the two sets, we Algorithm 1 Schroeppel-Shamir algorithm
Require: Knapsack element a1,..., an. Knapsack sum S Let q1 = bn/4c, q2 = bn/2c, q3 = b3 n/4c (1) (1) Pq1 Create SL (σ) and SL (): list of all i=1 i ai and list of 1···q1 (in the same order) (1) (1) Pq2 Create S (σ) and S (): list of all i ai and list of q +1···q R R i=q1+1 1 2 (2) (2) Pq3 Create S (σ) and S (): list of all i ai and list of q +1···q L L i=q2+1 2 3 (2) (2) Pn Create S (σ) and S (): list of all i ai and list of q +1···n R R i=q3+1 3 (1) (1) (2) (2) Call 4-way merge Algorithm 2 or 3 on (SL (σ), SR (σ), SL (σ), SR (σ)), n and S. Store returned set in Sol for each (i, j, k, l) in Sol do (1) (1) (2) (2) Concatenate SL ()[i], SR ()[j], SL ()[k] and SR ()[l] into Output: “ is a solution” end for
Algorithm 2 Original 4-Way merge routine (1) (1) (2) (2) Require: Four input lists (SL , SR , SL , SR ), knapsack size n, target sum T (1) (1) (2) (2) Let SL , SR , SL and SR be the sizes of the corresponding arrays. Create priority queues Q1 and Q2 (1) (2) Sort SR and SR in increasing order. Keep track of positions in InitPos1 and InitPos2 (1) for i from 0 to SL do (1) (1) Insert (i, 0) in Q1 with priority SL [i] + SR [0]. end for (2) for i from 0 to SL do (2) (2) (2) (2) Insert (i, SR − 1) in Q2 with priority T − SL [i] − SR [SR − 1]. end for Create empty list Sol while Q1 and Q2 are not empty do Peek at value q1 of lowest priority element in Q1. Peek at value q2 of lowest priority element in Q2. if q1 ≤ q2 then Get (i, j) from Q1 (1) if j 6= SR − 1 then (1) (1) Insert (i, j + 1) in Q1 with priority SL [i] + SR [j + 1]. end if end if if q1 ≥ q2 then Get (k, l) from Q2 if l 6= 0 then (2) (2) Insert (k, l − 1) in Q2 with priority T − SL [k] − SR [l − 1]. end if end if if q1 = q2 then Add (i, InitPos1[j], k, InitPos2[l]) to Sol end if end while Return list of solutions Sol discover all the solutions of the knapsack problem. This can be done in time and memory O˜(2n/2) by fully computing the two sets, sorting them and searching for collisions. In [21, 22], Schroeppel and Shamir show that, in order to find these collisions, it is not necessary to store the full sets S(1) and S(2). Instead, they generate them on the fly using priority queues (based either on heaps or a Adelson-Velsky and Landis trees), requiring memory O˜(2n/4). More precisely, let us define q1 = bn/4c, q2 = bn/2c, q3 = b3 n/4c. We (1) (1) (2) (2) n/4 introduce four sets SL , SR , SL and SR of size O(2 ) defined as follows:
(1) Pq1 q1 – SL is the set of pairs ( i=1 i ai, 1···q1 ) with 1···q1 ∈ {0, 1} ; (1) q – S is the set of (P 2 a , ) with ∈ {0, 1}q2−q1 ; R i=q1+1 i i q1+1···q2 q1+1···q2 (2) q – S is the set of (P 3 a , ) with ∈ {0, 1}q3−q2 ; L i=q2+1 i i q2+1···q3 q2+1···q3 (2) n – S is the set of (P a , ) with ∈ {0, 1}n−q3 . R i=q3+1 i i q3+1···n q3+1···n With these notations, solving the knapsack problem amounts to finding four (1) (1) (2) (2) (1) elements σL , σR , σL and σR in the corresponding sets such that S = σL + (1) (2) (2) σR + σL + σR . We call this a 4-way merge problem. The algorithm of Schroeppel and Shamir is described in Algorithm 1, using their original 4-way merge Algorithm 2 as a subroutine. Note that, in Algo- (i) (i) (i) rithm 1, we describe each set SX as two lists SX (σ) and SX () stored in the same order.
3.1 A variant of the Schroeppel and Shamir algorithm In practice, the need for priority queues of large size makes the algorithm of Schroeppel and Shamir harder to implement and to optimize. Indeed, using large priority queues either introduces an extra factor in the memory usage, or unfriendly cache behavior. As a consequence, we would like to avoid priority queues altogether. In order to do this, we present a variant of their algorithm, inspired by an algorithm presented in [3] that solves the problem of finding 4 elements from 4 distinct lists with bitwise sum equal to 0. Note that, from a theoretical point of view, our variant is not as good as the original algorithm of Schroeppel and Shamir, because for some exceptional knapsacks, it requires more memory. The idea is to choose a modulus M near 2(1/4−ε) n and to remark that the (1) (1) (2) (2) 4-way merge condition implies σL + σR ≡ S − σL − σR (mod M). As a consequence, for any solution of the knapsack, there exists a value σM , such that: (1) (1) (2) (2) σM = (σL + σR ) mod M = (S − σL − σR ) mod M.
Since, we cannot guess the correct value of σM , we simply loop over all possible values. This gives a new 4-way merge Algorithm 3, which can be used as a replacement for the original subroutine in Algorithm 1. Informally, for each test value of σM , Algorithm 3 constructs the set of all (1) (1) (1) sums σL + σR congruent to σM modulo M. This is done by sorting SR by (1) (1) values modulo M. Indeed, in this case, it suffices for each σL in SL to search Algorithm 3 Modular 4-Way merge routine (1) (1) (2) (2) Require: Four input lists (SL , SR , SL , SR ), size n, target sum T Require: Memory margin parameter: ε Let M be a random modulus in [2(1/4−ε) n, 2 · 2(1/4−ε) n] (1) (1) (1) Create list SR (M) containing pairs (SR [i] mod M, i) where i indexes all of SR (2) (2) (2) Create list SR (M) containing pairs (SR [i] mod M, i) where i indexes all of SR (1) (2) Sort SR (M) and SR (M) by values of the left member of each pair Create empty list Sol for σM from 0 to M − 1 do (1) Empty the list S (or create the list if σM = 0) (1) for i from 1 to size of SL do (1) (1) (1) Let σL = SL [i] and σt = (σM − σL ) mod M (1) Binary search first occurrence of σt in SR (M) (1) for each consecutive (σt, j) in SR (M) do (1) (1) (1) Add (σL + SR [j]), (i, j)) to S end for end for Sort list S(1) by values of the left member of each pair (2) for k from 1 to size of SL do (2) (2) (2) Let σL = SL [k] and σt = (T − σM − σL ) mod M (2) Binary search first occurrence of σt in SR (2) for each consecutive (σt, l) in SR (M) do 0 (1) (2) Let T = T − σL − SR [l] Binary search first occurrence of T 0 in S(1) for each consecutive (T, (i, j)) in S(1) do Add (i, j, k, l) to Sol end for end for end for end for Return list of solutions Sol (1) (1) (1) the value σM − σL in SR . Using this method, we construct the set S of the (1) birthday paradox algorithm as a disjoint union of smaller sets S (σM ), which are created one at a time within the loop on σM in Algorithm 2. Similarly, we (2) (2) implicitly construct S as a disjoint union of S (σM ), but do not store it, (1) instead searching for matching values in S (σM ).
Complexity analysis. If we ignore the innermost loop that writes down the solution set Sol, the running time of the execution of the loop iteration corre- (1) (2) sponding to σM is O˜(S (σM ) + S (σM )), with a polynomial factor in n that comes from sorting and searching. Summing over all iterations of the loop, we have a total running time of O˜(S(1) + S(2)) = O˜(2n/2), unless Sol has a size larger than O˜(2n/2). (1) (1) (2) (2) n/4 Where memory is concerned, storing SL , SR , SL and SR costs O(2 ). However, the memory required to store the partitioned representation of S(1) is (1) maxσM S (σM ). Note that we cannot guarantee that this maximum remains small. A simple counterexample occurs when all ai values (in the first half) are multiples of M. Indeed, in that case we find that S(1)(0) has size 2n/2. In general, we do not expect such a bad behavior, more precisely, we have: Theorem 2. For any real ε > 0 and modulus M close to 2(1/4−ε) n, for a fraction at least 1 − 2−4ε n of knapsacks with density D < 4 given by n-tuples (a1, ··· , an) and target value T , Algorithm 1 using as 4-way merge routine Algo- ˜ n/2 rithm 3 finds all of the NSol solutions of the knapsack in time O(max(2 ,NSol)) ˜ (1/4+ε) n using memory O(max(2 ,NSol)). Proof. The time analysis is given above. The bound on the memory use for almost all knapsacks comes from applying Corollary 1 with λ = 1/2 twice on the left and right-hand side subknapsacks on n/2 elements, using B = {0, 1}n/2. We need to use the fact that a random knapsack taken uniformly at random with n/D-bit numbers is close to a random knapsack modulo M, when D < 4.
A high bit version. Instead of using modular values to partition the sets S(1) and S(2), another option is to look at the value of the dn/4e higher bits. Depending on the precise context, this option might be more practical than the modular version. In the implementation presented in Section 5, we make use of both versions.
Early abort with multiple solutions. When the number of solutions NSol is large, and we wish to find a single solution, we can use an early abort strategy. Heuris- tically, assuming that the σM values corresponding to the many solutions are well-distributed modulo M, this reduces the heuristic expected running time to ˜ n/4 n/2 O(max(2 , 2 /NSol)). 3.2 Application to unbalanced knapsacks The basic birthday algorithm, the algorithm of Schroeppel–Shamir and our vari- ant can also, with some care, be applied to α-unbalanced knapsacks. In this case, if we let: