Relation collection for the Function Field Sieve

Jer´ emie´ Detrey, Pierrick Gaudry and Marion Videau CARAMEL project-team, LORIA, INRIA / CNRS / Universite´ de Lorraine, Vandœuvre-les-Nancy,` France Email: {Jeremie.Detrey, Pierrick.Gaudry, Marion.Videau}@loria.fr

Abstract—In this paper, we focus on the relation collection of . The purpose of this paper is to present a new step of the Function Field Sieve (FFS), which is to date the best implementation of this important step for FFS, targeting known for computing discrete logarithms in small- in our case finite fields of small characteristic with no characteristic finite fields of cryptographic sizes. Denoting such special Galois structure. We choose two benchmarks in the a finite field by Fpn , where p is much smaller than n, the main idea behind this step is to find of the form cases of characteristic 2 and 3, that are the most interesting a(t) − b(t)x in Fp[t][x] which, when considered as principal for cryptographic applications. The extension degrees were ideals in carefully selected function fields, can be factored into chosen so as to match the “kilobit milestone”: we consider products of low-degree prime ideals. Such polynomials are 1039 1039 and 647 . Note that the factorization of 2 − 1 called “relations”, and current record-sized discrete-logarithm F2 F3 computations need billions of those. has been completed only recently [12]. This is relevant Collecting relations is therefore a crucial and extremely to our benchmark as, among others, an 80-digit (265-bit) expensive step in FFS, and a practical implementation thereof prime factor was exhibited: had 21039 − 1 been a product of requires heavy use of cache-aware sieving algorithms, along only moderately large primes, there would have been better with efficient arithmetic over Fp[t]. This paper algorithms than FFS for computing discrete logarithms in presents the algorithmic and arithmetic techniques which were put together as part of a new public implementation of FFS, F21039 . aimed at medium- to record-sized computations. In the literature, many implementation details about the Keywords-function field sieve; ; polynomial relation collection are merely sketched. In this article, we arithmetic; finite-field arithmetic. wish to explain as many of them as possible. Some of them are “folklore” or easy adaptations of techniques used in I.INTRODUCTION the Number Field Sieve (NFS) for . For The computation of discrete logarithms is an old problem instance, the notion of skewness, or the use of bucket sorting that has received a lot of attention after the discovery during sieving that we describe below will come as no of public-key cryptography in 1976. Indeed, together with surprise to the NFS cognoscenti. However, to the best of our integer factorization, it is the most famous computationally knowledge, some techniques go beyond what was previously hard problem on which the security of public-key algorithms known. We mention two of them. The sieving step starts relies. We are interested here in the case of discrete loga- with a phase called norm initialization, where a large array is rithms in finite fields of small characteristic, for which the initialized with the degree of a polynomial, which is different best algorithm known is the function field sieve (FFS) first for each cell. In several implementations, only an approxi- introduced by Adleman [1] in 1994. Since then, and until mation of the size of the norm is used, in order to save time. 2005, there have been several works on this algorithm, both We propose several strategies that allow us to get an exact on the theoretical side [2], [3], [4] and on the practical result in this step, at a very reasonable computational cost. side [5], [6], [7]. This culminated with a record set by This includes using an analog of floating-point arithmetic Joux and Lercier who computed discrete logarithms in F2613 for polynomials. Another original contribution lies in the and surpassed the previous record set by Thome´ [8] in heart of the relation collection, during a sieving step, which 2001 using Coppersmith’s algorithm [9]. A long period with is similar in essence to the : in the essentially no activity on the topic has followed; it ended same large array as above, we have to visit all the positions recently with a new record [10], [11] set over F36×97 , which that verify a specific arithmetic property. In FFS, the set of took advantage of the fact that the extension was composite; these positions forms a vector space for which we give the this particular case is of practical interest in pairing-based general form of a basis, along with a method to compute it cryptography. quickly following some . We also adapt The FFS algorithm can be decomposed into several steps. the well-known Gray code walk to our case where we are Two of them are by far the most time-consuming: the only interested in monic linear combinations. relation collection step and the linear algebra step. While To complement the description of our implementation, the latter is not specific to FFS—it is a sparse-matrix kernel we make it freely available as part of the CADO-NFS computation that is essentially the same as for other discrete- project [13]. It is used to propose runtime estimates for logarithm algorithms based on combining relations—the re- the relation collection step in the two kilobit-sized finite lation collection, on the other hand, is specific to each family fields that we target. It reveals that these sizes are easily reachable with moderate computing resources; it is merely ideal in the ring of integers of the corresponding function a matter of months on a typical 1000-core cluster. The field K(t)[x]/f(x, t). Since this is a Dedekind domain, question of the subsequent linear algebra step is however there is a unique factorization in prime ideals. And if these left open for future research. Still, it is already fair to claim prime ideals have degrees less than a parameter called the that the cryptosystems whose security relies on the discrete- smoothness bound, we say that the principal ideal is smooth. logarithm problem in a small-characteristic finite field of Just like with NFS, complications arise from the fact that kilobit size provide only marginal security. Even more so, the ring of integers is not necessarily principal and that because the relation collection and the linear algebra steps there are non-trivial units. The Schirokauer maps and the are to be done just once and for all for a given finite field. notion of virtual logarithms [14] are the tools to solve these Outline: The paper is organized as follows. A short complications; we do not need to worry about this here, overview of the function field sieve, along with a focus on since it does not interfere with the relation collection step. its relation collection step, is first given in Section II. Sec- tion III then highlights several key aspects of the proposed B. Overview of the relation collection step implementation, while low-level architectural and arithmetic In this paper, we focus on this so-called relation collection issues are addressed in Sections IV and V, respectively. step: given a smoothness bound—which might be different Timing estimates for two kilobit-sized finite fields are pre- for each side—the goal is to produce (a, b) pairs, called sented in Section VI, before a few concluding remarks and relations, that simultaneously yield smooth ideals for both perspectives in Section VII. sides. Solving the resulting linear system requires finding at least as many relations as there are prime ideals of degree II.COLLECTING RELATIONS IN FFS less than the smoothness bound. A. A primer on the FFS algorithm The relation collection step is independent of the other The principle of the Function Field Sieve algorithm is steps in the FFS algorithm as long as we allow for any kinds very similar to that of the Number Field Sieve algorithm of polynomials f and g. This is the most time-consuming for computing discrete logarithms in prime fields, where the part, both in practice and in theory: optimizing the pa- rameters for this step will optimize the overall algorithm. ring of integers Z is replaced by the ring of polynomials K[t], with K a finite field of size #K = κ. We remark however that the linear algebra step has a high Assume that we want to compute discrete logarithms in space complexity that can become a practical issue. The usual approach to circumvent this problem is to let the K, a degree-n algebraic extension of the base field K, of car- n relation collection run longer than what theory alone would dinality #K = κ . We consider two polynomials f and g in K[t][x] such that their resultant in the x variable contains an recommend. This produces a larger set of relations, and irreducible factor ϕ(t) of degree n. The following diagram, there exist algorithms for selecting among them a subset where all maps are ring homomorphisms, is commutative: that is better suited for linear algebra than the original set of relations obtained without this oversieving strategy. K[t][x]/f(x, t) Ideals and norms: The highbrow description of the K[t][x] K[t]/ϕ(t) =∼ K smoothness notion deals with ideals. Without loss of gener- K[t][x]/g(x, t) ality, we focus here on the side of the diagram corresponding to the polynomial f, the other side being similar. For our The key to FFS is to study the behavior of an element of the purposes, it is enough to see a prime ideal p as a pair of form a(t) − b(t)x in this diagram. Along the two paths, or polynomials (p(t), r(t)), where p(t) is a monic irreducible sides, the element is tested for smoothness in the appropriate polynomial and r(t) is a root in x of the polynomial f(x, t) algebraic structures. Pushing the factored versions into the modulo p(t). More formally, the notation p = (p(t), r(t)) finite field K, we get a linear relation between the discrete represents the ideal hp(t), x − r(t)i. Restricting to ideals of logarithms of “small” elements. Once enough relations have that form means that, in this article, we ignore the places at been collected, they are put together to form a matrix for infinity and the problems that ramification can cause. Our which a kernel element gives the discrete logarithms of implementation takes care of these more complicated cases, these “small” elements. Finally, the logarithm of a given but we prefer to keep the presentation simple. element is obtained by a procedure called special-q descent An element (a, b) that potentially leads to a relation that uses again the diagram to produce a relation between the must be tested for smoothness. It means that we want the logarithm of the given element and the logarithms of smaller corresponding principal ideal on the f side to be the product elements, and recursively so until we can relate them to the of “small” prime ideals. Here, we face a terminology issue, known logarithms of the “small” elements. since in our setting the good notion of size of a prime ideal The smoothness notion is as follows. The element a(t) − is the degree of its p polynomial. Therefore, we are going b(t)x is mapped to an element of K[t][x]/f(x, t) (substitute to talk about the degree of a prime ideal p = (p, r) as the g for f for the other side), which is viewed as a principal degree of p (even though, formally, the degree of p is one). The factorization of the principal ideal is dictated by the q = (q, ρ) corresponding to either the f or the g side, and factorization of its norm which, in our case, takes a simple the q-lattice is exactly the set of (a, b) pairs for which the form. Denoting by F (X, Y, t) the polynomial obtained by principal ideal on this side is divisible by q. This corresponds homogenization of f(x, t) with respect to x, the norm to the set of (a, b) pairs such that bρ ≡ a (mod q), which is corresponding to a pair (a, b) is F (a, b), a polynomial in t. a K[t]-lattice of dimension 2 and determinant q. The relation Then, to each irreducible factor p of F (a, b) corresponds a search is then seen as a set of independent tasks, each task prime ideal p = (p, r) that divides the principal ideal, and consisting in finding relations in a given q-lattice. such that r ≡ a/b (mod p). (We have dropped the variable Basis of the q-lattice: Let q = (q, ρ) be a special-q.A t to keep our notation simple.) As a consequence, a simple basis of the q-lattice is given by the two vectors (q, 0) and restatement of the relation search task is to find (a, b) pairs (ρ, 1). It is a very unbalanced basis, and we can improve for which the two norms F (a, b) and G(a, b) have all their it using Gaussian lattice reduction to obtain two vectors u0 irreducible factors in K[t] of degree less than a given bound. and u1 whose coordinates have a degree about half of the An important remark is that only primitive pairs are degree of q. We then look for relations coming from (a, b) interesting, in the sense that if (a, b) gives a valid relation, pairs of the form iu0 + ju1, with i of degree less than I then multiplying both a and b by the same small factor yields and j of degree less than J. This leads to a and b of degree 1 another relation which, however, is essentially the same and approximately equal to max(I,J) + 2 deg q. is of no use to the next steps of the algorithm. Therefore, Skewness: There are cases where the polynomials f and we are only interested in the case where gcd(a, b) = 1 g are skewed, in the sense that the degree in t of the and, furthermore, where (say) b is monic to take care of xi coefficients decreases when i increases. In that case, a multiplications by constants. relevant search space for (a, b) that yields norms of more or Sieving and cofactorization: In order to test the smooth- less constant degree is a rectangle: the bound on the degree ness of the norms corresponding to a pair (a, b), we chose on a must be larger than the bound on the degree of b. a two-pass strategy as is now usual for NFS. The first pass The difference of these two bounds on the degrees is called consists in using a sieving process a` la Eratosthenes in order the skewness. In the q-lattice, to obtain the target skewness to efficiently remove the contribution of the prime ideals adapted to f and g, the Gaussian lattice reduction is modified of degree up to a given factor base bound—chosen to be so that both vectors u0 and u1 have a difference of degrees smaller than the actual smoothness bound—to the norm of between their coordinates that is close to the skewness. In the principal ideals that they divide. All of those (a, b) pairs the implementation, this translates into a minor change in such that what then remains of their norm is small enough the stopping criterion of the lattice reduction. will have better chances to completely factor as products of From now on, we consider that the sieving space for a  J  prime ideals of degree less than the smoothness bound. As I κ −1 given special-q is a κ by κ−1 + 1 rectangle in the (i, j) such, the promising pairs which have survived sieving on plane, covering all the polynomial pairs of degrees less than both sides are selected to undergo a full-blown smoothness I and J, respectively, and such that the j coordinate is either test in the second pass of the algorithm, which is sometimes 0 or monic. Indeed, if both i and j are multiplied by the same called cofactorization or large prime separation. constant, so are a and b and we get duplicate relations. The Note that there is no need to keep track of the actual norm correspondence with (a, b) is of the form a = ia0 +ja1 and of the principal ideal in the first pass. In fact, it is enough b = ib0 + jb1, where u0 = (a0, b0) and u1 = (a1, b1) is the to consider only the degree of the norm here, to which skew-reduced basis of the q-lattice. we simply subtract deg p for every prime ideal p = (p, r) In our current implementation, I and J are fixed and which divides the corresponding principal ideal. Also note equal. It could make sense to let them vary, for instance that, because of the sieving techniques used in the first pass, giving them a smaller value for larger special-q’s so as to we cannot process each pair (a, b) one after the other: the keep more or less always the same bound on the (a, b) pairs. sieving region has to be considered as a whole and allocated a priori as a large array, delimited by upper bounds on the B. Initializing the norms degrees of a and b, and in which all the (a, b) positions will Initializing the norm at a given position is a simple task; be considered simultaneously by the sieving process. however it can become costly if done naively. One could use an estimate of the degree (e.g., an upper bound) but it would III.ORGANIZATION OF THE COMPUTATION impact the number of positions to test for smoothness by A. Sieving by special-q either losing or keeping too many positions. To avoid these Since the set of (a, b) pairs that are tested for smoothness drawbacks, we decided to work on being precise and even is very large, we have to subdivide this search space into exact on the degree computation at a reasonable cost. pieces that are handled separately. Following the current We present our method with the polynomial f; it works trend for NFS and FFS, we split the (a, b) space according to similarly for the polynomial g. Recall that F denotes the so-called special-q lattices. A special-q is just a prime ideal homogenization of the polynomial f. For each special-q, we start by computing a linear transformation on f to get fq Finally, from the degree of the sum of the terms, we deduce such that Fq(i, j) = F (a, b). We then have to compute the the degree of the norm and the gap, unless the result is degree (in t) of Fq(i, j) and put it into the array to initialize zero, meaning that the precision N was not high enough the corresponding position. We assume from now on that the to conclude. Therefore, the computation takes the form of loop is organized by rows, i.e., that j is fixed and i varies. a loop where the precision N increases iteratively until the Saving many degree computations: Recall that, for a pair computation of the norm with N significant coefficients is (i, j), the norm that has to be tested for smoothness is given enough to infer its degree. P k d−k by Fq(i, j) = 0≤k≤d fq,ki j , where each term is a A few more improvements can be added to this strategy. polynomial in t of degree deg fq,k + k deg i + (d − k) deg j. When the precision N is small, it might happen that several When a position (i0, j0) has been initialized, it is possible to terms have a degree that is too small to contribute to the deduce the initialization of many following positions (i0 + final sum, so that their computation can be skipped from 0 0 i , j0) since for small values of i the degree of the norm the beginning. Also, when summing the terms, it can make does not change. To precisely characterize these i0, we define sense to do so starting with the terms of highest degrees. If the notion of gap. there are not too many cancellations, it can be the case that we can then guess the final degree before having to take the Definition 1. With the previous notation, the gap of F at q remaining terms into account. (i0, j0), denoted by γ, is defined by γ = −1 if only one term k d−k We finally remark that the first step where we consider fq,ki j has maximal degree and, otherwise, by 0 0 only the degrees of the terms can be seen as the particular k d−k γ = max deg fq,ki0 j0 − deg (Fq(i0, j0)) . case N = 0 of the subsequent loop. 0≤k≤d

k d−k If there is only one term fq,ki0 j0 of maximal degree, C. Bases for the p-lattices this property will not change as long as (i + i0) has the 0 Let p = (p, r) be a prime ideal. The goal of this section same degree as i . Thus, if we enumerate the polynomials 0 is to describe the set of (i, j) positions for which the in an order that respects the degree, we get the degree of corresponding ideal is divisible by p (hence whose norm the norm for free until the degree of (i + i0) increases. 0 is divisible by p). We recall that in terms of (a, b) position, If there are several terms of maximal degree, a change it translates into the condition a − br ≡ 0 (mod p), and in the degree of the norm only depends on the γ + 1 most therefore, the set of such (a, b) positions is a K[t]-lattice significant coefficients of (i + i0). Therefore, for all i0 such 0 for which (p, 0) and (r, 1) is a basis. Using the definition that deg i0 < deg i − γ, we also get the degree of the norm 0 of a and b in terms of i and j, the condition becomes for free. (Note that the case γ = 0 can occur.) i(a − rb ) ≡ −j(a − rb ) (mod p). This is a K-linear Since computing the gap comes at almost no additional 0 0 1 1 condition on the polynomials i and j, and therefore the set cost when computing the degree of the norm, this method of solutions is a K-vector space which is of finite dimension saves many computations. since the degree of i must be less than I and the degree of j Computing the degree of the norm: If there is only less than J. More precisely, the linear system corresponding one term, let say f ik0 jd−k0 , of maximal degree, then q,k0 0 0 to the condition has I + J unknowns and deg p equations, deg(F (i , j )) = deg f + k deg i + (d − k ) deg j , q 0 0 q,k0 0 0 0 0 so that the dimension of the space of solutions is at least γ = −1 and we do not have to actually compute the norm I + J − deg p. We will present a basis for this vector space to obtain its degree. that takes a different shape, depending on whether deg p is If there are several terms of maximal degree, we need smaller or larger than I. to take care of the possible cancellations. As a cheaper As a warm-up, we deal with the case where p divides alternative to computing the actual norm, we resort to an a − rb . The main condition on i and j simplifies to j ≡ 0 adaptive-precision approximation, using an analog of the 0 0 (mod p). A basis is then given by the set of vectors µ = floating-point number system for polynomials. We define k (tk, 0) for 0 ≤ k < I, and ν = (0, t`p) for 0 ≤ ` < the precision N representation of a polynomial p(t) = ` P k J − deg p. The dimension is then I + J − deg p. pkt by the pair (deg p, mantN (p)), where the mantissa N−1 From now on, we will assume that a0 − rb0 is invertible mantN (p) = pdeg pt + ··· + pdeg p−N+1 is the polyno- mial of degree N−1 corresponding to the N most significant modulo p, and we define coefficients of p. λp = −(a1 − rb1)/(a0 − rb0) mod p, Starting from an initial precision N = N0, we compute so that the main condition becomes i ≡ λpj (mod p). the “floating-point” approximations of the terms f ikjd−k, q,k 0 0 1) Case of small primes: using truncated high products for the multiplications. Before summing those terms, we first align them to the maximal de- Lemma 2. Let p = (p, r) be a prime ideal of degree gree, as is the case for regular floating-point additions. This L < I. The vector space of (i, j) positions for which the can be seen as a conversion to a fixed-point representation. corresponding ideal is divisible by p is of dimension I+J−L k and admits the basis given by the vectors µk = (t p, 0) for Figure 1. The vectors obtained during the Euclidean algorithm (black 0 ≤ k < I − L, and ν = (t`λ mod p, t`) for 0 ≤ ` < J. dots), completed with some of their multiples (gray dots) to form the full ` p K-basis of vectors of the p-lattice with coordinates of degree at most deg p. Proof: The given vectors are indeed in the target set deg σ` of positions, and they are linearly independent, due to their v0 L echelonized degrees. To see that it is a basis, let us consider tv1 (i, j) with deg i < I, deg j < J, and i ≡ jλp (mod p). Let v1 then i0 be the polynomial of degree less than p such that I v2 i0 ≡ jλp (mod p). Then the vector (i0, j) can be obtained t2v by combining the ν`’s with the coefficients of j. Furthermore 3 i and i0 differ by a multiple of p of degree less than I, so that tv3

(i, j) can be written as a linear combination of the claimed v3 basis which, consequently, is indeed a basis. v 4 deg τ 2) Case of large primes: We assume now that p is of 0 J L ` v5 degree L greater than or equal to I. In fact, we can assume furthermore that its degree is also greater than or equal to Figure 1. Consequently, the proposed family of vectors is J. Indeed, the symmetric role of I and J makes it possible K-free and contains L + 2 elements. to write a similar basis as before if the degree of J is larger In order to get the exact dimension of the vector space, we than the degree of p. In practice, this would raise memory just count its elements. Consider the vector space {(i, j) | locality issues when traversing a vector space given by a i ≡ λ j (mod p), with deg i ≤ L, deg j ≤ L}, and split basis of this shape, but for the moment, we always take p it into κL+1 disjoint subsets S , one for each possible value I = J in our code. j of j with deg j ≤ L. For any j, Sj has the same cardinality The K[t]-lattice of the valid (i, j) positions corresponding as the set of polynomials i of degree at most L such that to p is generated by (p, 0) and (λp, 1). To study this i ≡ jλp (mod p). Since L is precisely the degree of p, this lattice, we consider the sequence computed by the Euclidean set has the same cardinality κ as the base field K. Adding algorithm starting with these two vectors: let v` = (σ`, τ`), up the contributions, we obtain a vector space of cardinality defined by v0 = (p, 0), v1 = (λp, 1) and, for all ` ≥ 1, κL+2, thus of dimension L + 2 as claimed. σ`+1 = σ`−1 − σ`q` and τ`+1 = τ`−1 − τ`q`, where q` is For our purposes, we are only interested in those elements the quotient in the of σ`−1 by σ`. Since for which the degrees of their coordinates are (strictly) p is prime, λp is coprime to p and the final vector of the bounded by I and J respectively. Due to the echelonized sequence is (0, p). The finite family of vectors (v`) is a K- form of the basis, it is enough to keep only those basis free family of vectors with the degrees of both coordinates elements whose degrees satisfy these constraints in order to less than or equal to L. In the following lemma it is shown generate all the relevant p-lattice elements. how to complete it to form a basis of such vectors. D. Enumerating the elements of a p-lattice Lemma 3. Let p = (p, r) be a prime ideal of degree L and Once the K-basis of a p-lattice is known, the next step (v` = (σ`, τ`)) the corresponding Euclidean sequence. The vector space of (i, j) positions for which the corresponding is to enumerate all the (i, j) pairs in this lattice and to ideal is divisible by p with degrees at most L in each subtract deg p to the degree of the norm of the corresponding coordinate has dimension L + 2 and admits the basis ideals. Obviously, this task can be achieved by considering {v } ∪ (∪ {tsv | 0 ≤ s < deg σ − deg σ }). all the possible K-linear combinations of the basis vectors. 0 `≥1 ` `−1 ` However, attention must be paid to a few details. Proof: With the convention that the degree of 0 is In order to address these issues for all possible types −1, the claimed basis is such that the degree in the first of p-lattices at once, whether p be a small or a large coordinate takes all values from −1 to L exactly once, prime ideal, or even when a0 − rb0 ≡ 0 (mod p), we and the same for the second coordinate. Indeed, since, by introduce the following notations, designed to cover all induction, the degrees of the σ`’s are strictly decreasing, cases without loss of generality: given a prime ideal p, and those of the τ`’s are strictly increasing, we have that we denote the basis of the corresponding lattice seen as deg τ`+1 − deg τ` = deg q`, which is itself, by construction, a K-vector space, as constructed in Section III-C, by the equal to deg σ`−1 − deg σ`. Therefore, when iterating from vectors (µ0, . . . , µd0−1, ν0, . . . , νd1−1), where the µk’s and v`−1 to v`, a drop δ in the degree of σ` is directly followed the ν`’s are of the form µk = (γk, 0) and ν` = (α`, β`), for by an identical increase in the degree of τ` in the next 0 ≤ k < d0 and 0 ≤ ` < d1, respectively, and where s iteration, thus showing that the vectors t v`, for 0 ≤ s < δ, the degrees of the γk’s and of the β`’s are echelonized: bridging the gaps between v`−1, v` and v`+1, will all have 0 ≤ deg γ0 < deg γ1 < ··· < deg γd0−1 < I and distinct degrees in both of their coordinates, as illustrated in 0 ≤ deg β0 < deg β1 < ··· < deg βd1−1 < J. As we impose j to be monic (see Section III-A), we first any K-free family of monic vectors (ν`)0≤`

(µk)0≤k> 1) ^ t1; Compile-time optimization for a given base field K: Even r[0] = p[0] ^ t2; r[1] = (r[0] & t2) ^ t0; if the library should be able to support different base fields, Enumerating (monic) polynomials in such a way also the chosen field is known at compile time. This knowledge computes the transition sequence ∆ (∆0, respectively) of must be exploited so as to properly inline and optimize all the corresponding (monic) κ-ary Gray code, as it is the the field-specific low-level primitives. sequence of the t-adic valuations of the successive (monic) To meet all these requirements, we have developed our polynomials taken in that order. In the previous example, own library for polynomial arithmetic. Written in C, it the sequence ∆ can be retrieved simply by computing the defines types and related functions for 16-, 32-, and 64- number of trailing zeros of the successive values of t2. coefficient polynomials, along with multiprecision polyno- mials. Most of the provided functions are independent of the D. Addressing the (i, j) array size of K, and only a few field-specific core functions, such When sieving by rows or when applying norm updates as addition or multiplication by an element of K, should be from a bucket, the accesses to the (i, j) array are mostly defined in order to add support for another base field to the random, albeit restricted to a few rows. Converting a given library. For the time being, only F2 and F3 are supported. position in the (i, j) plane to an integer offset in the memory representation of this plane is thus critical to the sieving have more or less similar degrees on both f and g sides. We process. thus take the parameters given in the following table: Over the binary field K = 2, this conversion is trivial, F F21039 F3647 since the bitsliced representation of a polynomial p(t) ∈ Sieving range I = J 15 9 K[t] already corresponds to the integer p˜(2), where p˜(t) ∈ Skewness 3 1 Max. deg. of sieved primes (factor base bound) 25 16 Z[t] denotes the corresponding integer polynomial. One can Max. deg. of large primes (smoothness bound) 33 21 then simply address the (i, j) array by offsets of the form Threshold degree for starting cofactorization 99 63 p˜ (2), with p (t) = i(t) + tI j(t). (i,j) (i,j) Timings: We give the running time of our code with these The situation is however not as simple over K = . F3 parameters on one core of an Intel Core i5-2500. The code Given the representation p of a polynomial p(t) ∈ [t] F3 was linked against the gf2x library version 1.1 so that we as two `-bit words, interleaving the bits of p[0] and p[1] can take advantage of the available PCLMULQDQ instruction would correspond to evaluating p˜(t) at 4, which would for polynomial multiplication over . For various degrees then result in only (3/4)` of the 2`-bit integers being F2 of special-q’s, we give the average time to compute one valid representations of polynomials. On the other hand, if relation and the average number of relations per special- evaluating p˜(t) at 3 would provide us with a compact integer q. Since the number of special-q’s of a given degree is representation, it would be far more expensive to compute. close to the number of monic irreducible polynomials of this In the current version of our library, we have settled for degree, this gives enough information to deduce how many an intermediate solution between those two extremes: noting relations can be computed in how much time. The yield that 35 = 243 256 = 28, we can split p(t) into 5- / and the number of relations vary a lot from one special-q coefficient chunks (or pentatrits) and, thanks to a simple to the other, even at the same degree. The average values look-up table, convert the 10 bits representing each chunk given in the following table have been obtained by letting into its evaluation at 3, which then fits on a single octet (8 the program run with special-q’s of the same degree until bits). A ternary polynomial of degree less than d can then be we reach a point where the measured yield is statistically represented using at most 8dd/5e bits, which is much more within a ±3 % interval with a confidence level of 95.4 %. compact than using 2 bits for each trit. It is also possible to easily adapt this method to account for monic polynomials deg q 26 27 28 29 30 31 32 F21039 Yield (s/rel) 0.59 0.71 0.88 1.07 1.31 1.70 2.05 by using a specific look-up table for the most-significant Rels per special-q 32.6 26.2 20.6 16.5 13.3 10.2 8.3 pentatrit. Addressing the (i, j) array is then just a matter of deg q 17 18 19 20 I converting p(i,j)(t) = i(t) + t j(t) in the same manner. F3647 Yield (s/rel) 2.24 2.88 3.52 4.71 Rels per special-q 23.5 15.7 11.8 8.0 VI.BENCHMARKSANDTIMEESTIMATES For F21039 , since the smoothness bound is set to degree We propose two finite fields K as benchmarks for our 33, we can give a rough estimate of 2 × 234/33 ≈ 1.04 · 109 implementation: F21039 and F3647 . The extension degrees are for the number of ideals in both factor bases. Collecting prime, so that the improvements of [22], [11] based on the relations up to special-q’s of degree 30 will provide about Galois action are not available. The sizes of both problems 1.19·109 relations in about 28 600 days. Taking into account are similar and correspond to the “kilobit” milestone. the fact that not all ideals are involved but that there are We performed a basic polynomial selection, based on an duplicate relations, this should be just enough to get a full exhaustive search of a polynomial with many small roots. set of relations. Collecting relations for all special-q’s up to This step would benefit from much further research but this degree 31 will provide about 1.90 · 109 relations in about is not the topic of the present paper. We thus give without 51 000 days, which will give a lot of excess and plenty of further details the polynomials used for this benchmark, room to decrease the pressure on the linear algebra. The where hexadecimal numbers encode polynomials in F2[t] tuning of this amount of oversieving cannot be done without (resp. F3[t]) represented by their evaluation at 2 (resp. 4): a linear algebra implementation and is, therefore, out of the f = x6 + 0x7 x5 + 0x6 x + 0x152a scope of this paper. Nevertheless, we expect that in practice F21039 g = x + t174 + 0x1ef9a3 the relation collection will be done for a large proportion of special-q’s of degree 31. For 647 , similar estimates based f = x6 + 0x2 x2 + 0x11211 F3 647 on the large prime bound give a theoretical number of ideals F3 g = x + t109 + 0x1681446166521980 of 1.53 · 109. Collecting relations up to special-q’s of degree It can be checked that the resultants of f and g are 19 will provide about 1.24 · 109 relations in about 45 300 polynomials in t that have irreducible factors of degree 1039 days while going to degree 20 will yield about 2.63 · 109 and 647, respectively, and are therefore suitable for discrete- relations in about 121 000 days. logarithm computations in the target fields. To convert these large numbers into more practical no- Parameters: For our choices of polynomials, when taking tions, let us assume that we have a cluster of 1000 cores special-q’s on the g side (the so-called rational side), norms similar to the one we used. Then a full set of relations with enough redundancy can be computed in a bit more than a REFERENCES month for F21039 , and in about 4 months for F3647 . [1] L. M. Adleman, “The function field sieve,” in ANTS I, LNCS Runtime breakdown: For F21039 (resp. F3647 ), we give 877, pp. 108–121, 1994. details on where the time is spent for special-q’s of degree [2] L. M. Adleman and M.-D. A. Huang, “Function field sieve 30 (resp. degree 19), in percentage and in average number method for discrete logarithms over finite fields,” Inf. Com- of cycles per (a, b) candidate. put., 151(1–2):5–16, 1999. [3] R. Matsumoto, “Using Cab curves in the function field sieve,” F21039 , deg q = 30 F3647 , deg q = 19 IEICE Trans. Fund., E82-A(3):551–552, 1999. Step Cycles/pos Percentage Cycles/pos Percentage [4] R. Granger, “Estimates for discrete logarithm computations Initialize norms 1.10 2.04 % 43.65 6.17 % in finite fields of small characteristic,” in Cryptography and Sieve by rows 9.73 18.15 % 180.11 25.45 % Coding, LNCS 2898, pp. 190–206, 2003. Fill buckets 31.73 59.21 % 211.23 29.84 % [5] R. Granger, A. J. Holt, D. Page, N. P. Smart, and F. Ver- Apply buckets 2.74 5.12 % 4.53 0.64 % cauteren, “Function field sieve in characteristic three,” in Cofactorization 7.43 13.87 % 266.53 37.66 % ANTS VI Total 53.59 100.00 % 707.77 100.00 % , LNCS 3076, pp. 223–234, 2004. [6] A. Joux and R. Lercier, “Discrete logarithms in GF (2607) From this table, it is clear that we have not yet spent as and GF (2613),” Posting to the List, 2005. [7] ——, “The function field sieve is quite special,” in ANTS V, much time on optimizing the case K = F3 as we have for LNCS 2369, pp. 343–356, 2002. F2. We expect that there is a decent room for improvement [8] E. Thome,´ “Computation of discrete logarithms in F2607 ,” in in characteristic 3. On the other hand, the arithmetic cost ASIACRYPT 2001, LNCS 2248, pp. 107–124. is intrinsically higher over F3 than over F2: the addition of [9] D. Coppersmith, “Fast evaluation of logarithms in fields of two polynomials requires at least 6 instructions [23] instead characteristic two,” IEEE Trans. Inf. Theory, 30(4):587–594, of just one, and the multiplication in characteristic 2 is 1984. implemented in hardware whereas in characteristic 3 this [10] T. Hayashi, N. Shinohara, L. Wang, S. Matsuo, M. Shirase, and T. Takagi, “Solving a 676-bit discrete logarithm problem is a software implementation based on the addition. This in GF (36n),” in PKC 2010, LNCS 6056, pp. 351–367. difference is due to our choice to implement on general [11] T. Hayashi, T. Shimoyama, N. Shinohara, and T. Takagi, purpose CPUs, but would decrease or even disappear if we “Breaking pairing-based cryptosystems using ηT pairing over 97 allowed ourselves to resort to specific hardware. GF (3 ),” in ASIACRYPT 2012, LNCS 7658, pp. 43–60. [12] K. Aoki, J. Franke, T. Kleinjung, A. Lenstra, and D. A. Osvik, “A kilobit special number field sieve factorization,” VII.CONCLUSION in ASIACRYPT 2007, LNCS 4833, pp. 1–12. We have presented in this paper a new implementa- [13] S. Bai, A. Filbois, P. Gaudry, A. Kruppa, F. Morain, tion of the relation collection step for the function field E. Thome,´ and P. Zimmermann, “CADO-NFS: Crible algebrique:´ Distribution, optimisation – Number field sieve,” sieve algorithm, which combines state-of-the-art algorithmic, http://cado-nfs.gforge.inria.fr/. arithmetic and architectural techniques from previous works [14] O. Schirokauer, “Discrete logarithms and local units,” Phil. on both FFS and NFS (where applicable), along with original Trans. R. Soc. A, 345(1676):409–423, 1993. contributions on several key steps of the algorithm. The [15] D. M. Gordon and K. S. McCurley, “Massively parallel source code of our implementation is released under a public computation of discrete logarithms,” in CRYPTO ’92, LNCS 740, pp. 312–323. license. To the best of our knowledge, this is the first FFS [16] L. Hars, “Applications of fast truncated multiplication in public code for handling record sizes. We hope this will cryptography,” EURASIP J. Embed. Syst., 2007:061721, 2007. serve as a reference point for future developments on this [17] J. M. Pollard, “The lattice sieve,” in A. K. Lenstra and H. W. subject as well as for dimensioning key-sizes for DLP-based Lenstra, eds., The development of the number field sieve, cryptosystems. LNM 1554, pp. 43–49, 1993. [18] K. Aoki and H. Ueda, “Sieving using bucket sort,” in ASI- Of course, this project is still under heavy development, ACRYPT 2004, LNCS 3329, pp. 92–102. especially for base fields K larger than F2. Among other [19] V. Shoup, “NTL: A library for doing number theory,” http: perspectives, we plan to investigate the relevance of dele- //www.shoup.net/ntl/. gating some parts of the computation to dedicated hardware [20] F. Chabaud and R. Lercier, “ZEN: A toolbox for fast com- accelerators such as GPUs or FPGAs. Finally, we plan to putation in finite extension over finite rings,” http://zenfact. sourceforge.net/. integrate our code for the relation collection step into a full [21] R. P. Brent, P. Gaudry, E. Thome,´ and P. Zimmermann, “gf2x: implementation of FFS, in order to complete the ongoing A library for multiplying polynomials over the binary field,” computations of discrete logarithms over F21039 and F3647 , http://gf2x.gforge.inria.fr/. whence setting kilobit-sized records as new landmarks. [22] A. Joux and R. Lercier, “The function field sieve in the medium prime case,” in EUROCRYPT 2006, LNCS 4004, Acknowledgements. The authors wish to thank the other pp. 254–270. [23] Y. Kawahara, K. Aoki, and T. Takagi, “Faster implementation members of the CARAMEL group for numerous interesting m of ηT pairing over GF(3 ) using minimum number of discussions regarding preliminary versions of this work, logical instructions for GF(3)-addition,” in Pairing 2008, in particular Razvan Barbulescu, Cyril Bouvier, Emmanuel LNCS 5209, pp. 282–296. Thome´ and Paul Zimmermann.