Number-theoretic

Richard P. Brent ANU

∗ Copyright c 2011, R. P. Brent. comp4600, 2011 1. Polynomials and integers

Reference: CLRS, Chapter 30. We first consider algorithms for integer and polynomial arithmetic, particularly . Let us formally define what we mean by a “polynomial”.

Polynomials over a ring Let be a ring. With a symbol x / we form R ∈ R the expressions

ν P (x) = pν x ν X in which the sum is taken over a finite number of different integers ν 0, and where the ≥ “coefficients” p belong to the ring . Such ν R expressions are called “polynomials” or more precisely “polynomials in x over ”; the symbol R x is called an indeterminate. We regard two polynomials as equal if they differ only by zero coefficients. For example, 7x +0x2 =0+7x =7x.

2 The ring [x] R The set of polynomials in an indeterminate x over a ring is written as [x]. Addition and R R multiplication in [x] are defined in the natural R way. With this definition, [x] forms a ring. R In applications of interest to us, is often a R field, e.g. the field Q of rationals, the field R of real numbers, the field C of complex numbers, or a finite (Galois) field GF(p) = Z/pZ, where p is a prime. We are sometimes interested in polynomials over rings which are not fields, e.g. the ring Z of integers, or the ring Z/mZ of integers modulo m, where m is a composite number. In most cases the rings are commutative (an exception is the ring of n n matrices over a field, n > 1). × The degree of P (x) [x] is ∈ R deg(P ) = max ν p =0 . { | ν 6 } ∪ {−∞} Note: is included in case P = 0. −∞

3 Examples P (x)=5+2x3 9x7 is a polynomial over the − ring of integers, and deg(P )=7. P (x) = 0 is a polynomial over whatever ring you choose, and deg(P ) = . −∞ P (x) = πx99 is a polynomial over R. 5 2 P (y)=1+ 7 y is a polynomial over Q. P (z)=(1+ √2) (7 5√2)z is a polynomial − − over Q(√2), a finite extension of Q. If P,Q R[x] then ∈ deg(P + Q) max(deg(P ), deg(Q)) ≤ and deg(PQ) = deg(P ) + deg(Q) . This relation motivates our definition deg(0) = (as in Knuth but not in CLRS). −∞

4 Polynomials in several variables If x, y are indeterminates we can consider polynomials in y whose coefficients are polynomials in x, e.g.

P (x)(y) = P (x, y)=(5+3x)+(7+2x2)y3 is a polynomial in y whose coefficients are polynomials in x. In this case P [x][y]. ∈ R We usually write P (x)(y) as

λ ν P (x, y) = pλ,ν x y . Xλ,ν There are several possible definitions of the degree of a multivariate polynomial. For example, we could define

deg(P (x, y)) = max λ + ν p =0 . { | λ,ν 6 } ∪ {−∞}

5 Interpretation as functions We often interpret a polynomial P (x) (x) as ∈ R a function f : . R → R For example, the Chebyshev polynomials Tn(x) are polynomials of degree n over R or C, defined by

T0(x)=1, T1(x) = x, and

T (x)=2xT (x) T (x) for n 1. n+1 n − n−1 ≥

We can also regard Tn(x) as a function which satisfies the equation

T (cos θ) = cos nθ for θ C. n ∈

6 Formal power series

If we have an infinite sequence (a0, a1,...) and an indeterminate x then we can define a formal power series A(x) by

ν A(x) = aνx . νX≥0

The coefficients aν are assumed to lie in a ring which may in particular cases be a field such R as R or C. We can define addition and multiplication of power series in the obvious way: if C(x) = A(x) + B(x) then

cν = aν + bν and if C(x) = A(x)B(x) then

cν = aλbν−λ . 0≤Xλ≤ν With these definitions the formal power series over form a ring. R

7 Definition of ord Analogous to the degree of a polynomial, it is useful to define

ord(A) = min ν a =0 + , { | ν 6 }∪{ ∞} where the + is included in case A = 0. ∞ Exercise:

ord(A + B) min(ord(A), ord(B)) ≥ and ord(AB) = ord(A) + ord(B) .

Exercise: If we consider power series over a field F , then A(x) has a multiplicative inverse (i.e. B(x) such that A(x)B(x)=1) iff ord(A)=0. Remark: If in the definition we allow a finite number of nonzero coefficients aν with ν < 0, then we get Laurent series. The Laurent series over a field F form a field.

8 Convergence We have defined power series quite formally, so no question of convergence or divergence arises. For example, the power series

ν A(x) = 22 xν νX≥0 is a perfectly well-defined formal power series. We can think of it as a generating function ν which “generates” the coefficients 22 . However, if we want to regard a power series over a field F as a function then questions of convergence arise (and this does not always make sense, e.g. if F is a finite field). In this course power series will only be used as generating functions, so we can ignore questions of convergence.

9 Truncated power series If A(x) and B(x) are two power series over the same ring , we write R A(x) = B(x) mod xn iff ord(A(x) B(x)) n. − ≥ In other words, iff a = b for 0 ν < n. ν ν ≤ If P (x) is a polynomial then we can regard P (x) as a power series with only a finite number of nonzero terms. If A(x) is a power series and n 0 then clearly ≥ there is a unique polynomial P (x) such that deg(P ) < n and A(x) = P (x) mod xn. Proof: define

aν if 0 ν < n pν = ≤ ( 0 otherwise

10 Representation of polynomials A polynomial P (x) of degree n can be represented as an array A[0 .. n] provided the base type of the array can represent the coefficients of P (x). In other words, we require A[ν] to represent pν . Similarly, a multivariate polynomial can be represented as a multidimensional array.

Sparse polynomials A polynomial is said to be sparse if “most” of the coefficients a0,...,adeg(A) are zero (and similarly for multivariate polynomials). We shall not attempt to define what “most” means but it typically means “at least 90%”. In order to save storage (and arithmetic) it may be desirable to store sparse polynomials as linked lists, so only the nonzero coefficients need to be stored.

11 Multiple-precision integers A (large) nonnegative integer N <βt can be represented as

t−1 ν N = aν β , νX=0 where β > 1 is the base or radix, the aν are “base β digits” and (usually) satisfy 0 a <β, ≤ ν and t is the number of digits. (For signed integers we can use a “sign and magnitude” representation, i.e. N = s N , where s = 1.) | | ± Clearly there is a close correspondence between the integer N represented as above and the polynomial

t−1 ν P (x) = aν x . νX=0 Note that N = P (β). Because of this correspondence, many algorithms for operating on large integers are closely related to algorithms for operating on polynomials.

12 Other operations on polynomials and power series There are some operations on polynomials which have no analogue for integers, e.g. differentiation, composition, reversion. The formal derivative P ′(x) of a polynomial or ν power series P (x) = pν x is defined by

′ P ν−1 P (x) = νpνx . ν>X0 For a polynomial P (x) over a field of characteristic zero (e.g. Q, R or C), we can define a formal integral by

x xν+1 P (t) dt = pν . 0 ν +1 Z νX≥0

13 Composition and reversion The composition of two power series P (x) and Q(x), where ord(Q(x)) > 0, is defined to be the power series C(x), where

ν C(x) = P (Q(x)) = pν Q(x) . νX≥0 Note that ord(Q(x)n) n, so each coefficient c ≥ n of C(x) is defined by a finite sum involving p0,...,pn and q1,...,qn; thus no questions of convergence arise. If P (x) and Q(x) are power series, ord(P (x)) = 1, ord(Q(x)) = 1, and

P (Q(x)) = Q(P (x)) = x,

then we say that Q(x) is the reversion of P (x), and we write Q(x) = P (x)(−1). For example, if P (x) = x/(1 x) = x + x2 + x3 + , and − ··· Q(x) = x/(1 + x) = x x2 + x3 , then it is − −··· easy to verify that Q(x) = P (x)(−1).

14 Arithmetic on polynomials Suppose we are given two polynomials A(x) and B(x) over , of degree (at most) n, and want to R compute the product C(x) = A(x)B(x). From the definition,

ck = aibj , i+Xj=k for 0 k 2n, where we assume 0 i n, ≤ ≤ ≤ ≤ 0 j n (sometimes it is convenient to define ≤ ≤ ai =0 if i > n, etc).

The number of terms in the sum for ck is k +1 if 0 k n, and 2n +1 k if n

15 Karatsuba’s It turns out that the obvious O(n2) result is not the best possible. Karatsuba discovered the following (not quite, but his idea was similar). Suppose we can multiply polynomials of degree n 1 in time M(n), and we are given two − polynomials A(x) and B(x) of degree 2n 1. − We write

n A(x) = A0(x) + A1(x)x ,

n B(x) = B0(x) + B1(x)x ,

where deg(Aj(x)) < n, deg(Bj(x)) < n for j =0, 1. Now

A(x)B(x) = A0(x)B0(x) n + (A0(x)B1(x) + A1(x)B0(x))x 2n + A1(x)B1(x)x .

16 Karatsuba continued Suppose we compute

P1(x) = A0(x)B0(x) ,

P2(x) = A1(x)B1(x) , and

P3(x)=(A0(x) + A1(x))(B0(x) + B1(x)) .

Then we easily see that

A (x)B (x)+A (x)B (x) = P (x) P (x) P (x) 0 1 1 0 3 − 1 − 2 so

A(x)B(x) = P1(x) n + (P3(x) P1(x) P2(x))x 2−n − + P2(x)x .

17 Complexity analysis Using Karatsuba’s “divide and conquer” idea, we have reduced the problem to three multiplications of polynomials of degree n 1, − plus some additions/subtractions (which take time O(n)) and multiplications by powers of x (which do not require any arithmetic, just shifting array elements). Thus

M(2n) 3M(n) + O(n) . ≤ We can easily deduce that

M(2k) = O(3k) .

When multiplying polynomials we can always “round up” the degree to the next integer of the form 2k 1. Thus − M(n) = O(nα) , where α = log2(3) < 1.6 .

18 Generalisation Karatsuba’s idea can be generalised to give

M((r + 1)n) (2r + 1)M(n) + O(n) ≤ for any fixed integer r 1 (see Knuth 4.3.3). ≥ § Thus M(n) = O(nα) , where log(2r + 1) α = log (2r +1) = . r+1 log(r + 1) By choosing r sufficiently large, we can make α arbitrarily close to 1. Thus

M(n) = O(n1+ε)

for any ε > 0. (The constant hidden in the “O” notation depends on ε.) We omit the details because methods based on the fast Fourier transform (FFT) are better. If the FFT is applicable we can obtain a sharper bound

M(n) = O(n log n) .

19 Fast integer multiplication Karatsuba’s idea also applies to the problem of multiplying integers represented in base β with n digits (in fact it was originally presented for this problem). After applying the divide and conquer step we may have to normalise the digits (i.e. reduce them to the range [0,β 1) − which involves “carry propagation”. Algebraically, we use a βν+1 + a βν =(a +1)βν+1 +(a β)βν . ν+1 ν ν+1 ν − As for the multiplication of power series, we can use a generalisation of Karatsuba’s algorithm to show that n-digit numbers can be multiplied in time O(n1+ε) for any ε > 0.

20 2. Polynomial and integer division

We now consider algorithms for finding reciprocals, performing division, and related operations for polynomials, power series and large integers. It turns out that Newton’s method is helpful for reducing these operations to multiplication. We shall show that in a certain sense the operations of multiplication, squaring, and forming reciprocals all have the same complexity (similar to what we already saw for matrix operations).

21 Newton’s method Newton’s method is a well-known method for approximating zeros of a function by successive linear approximation – the iteration is

f(xk) xk+1 = xk ′ − f (xk) and under suitable conditions this converges to a zero ζ of f. Newton’s method can be used to approximate the square root or reciprocal of a real number. For example, to approximate √a for a > 0 we take f(x) = x2 a, so f ′(x)=2x and the − Newton iteration is

2 xk a xk+1 = xk − − 2xk which can be written as 1 a xk+1 = xk + . 2  xk 

22 Newton’s method for reciprocal Similarly, to approximate the reciprocal of a real number a = 0 we take 6 1 f(x) = a , − x so 1 f ′(x) = x2 and the Newton iteration is

1 2 xk+1 = xk a xk −  − xk  or x = x (2 ax ) . k+1 k − k

Note that this iteration only requires multiplication and subtraction. Thus, it can be used to approximate a reciprocal without doing any divisions !

23 Rate of convergence

Let us consider the error εk defined by

ax =1 ε . k − k Then 1 ε = (1 ε )(1 + ε ) , − k+1 − k k so 2 εk+1 = εk . We see that the convergence is quadratic provided ε < 1 . | 0| In general, Newton’s method converges quadratically to a simple zero of a C(2) function f provided the initial approximation is sufficiently good – this is the Newton- Kantorovich theorem.

24 Application to power series The reason for considering Newton’s method here is that it also applies to power series. In fact, using k steps of Newton’s method, we can compute (exactly !) the first 2k coefficients in the reciprocal of a power series.

Let A(x) = a0 + a1x + ... be a power series over a field F , with a = 0 (so ord(A) = 0). We take 0 6 b0 =1/a0 and take B0(x) = b0 to start the Newton iteration. Then, by analogy with the iteration for reciprocal considered previously, we define

k+1 B (x) = B (x)(2 A(x)B (x)) mod x2 . k+1 k − k k+1 Why the mod x2 ? Because terms past this point are “garbage” anyway (as we shall see when we consider the error in Bk(x)) so there is no point in wasting time computing them.

25 The error term If E (x)=1 A(x)B (x) then we have k − k E0(x)=0mod x and 2 2k+1 Ek+1(x) = Ek(x) mod x , so by induction on k,

2k Ek(x)=0mod x . Expressed another way, ord(E (x)) 2k . k ≥ Thus, after k iterations, we have 1/A(x) with k “error” O(x2 ).

The time bound The time required to compute n =2k terms in the reciprocal of A(x) is O(M(2k) + M(2k−1) + ) + O(n) ··· and under plausible assumptions about the function M(n) this is O(M(n)) overall.

26 Division of power series If A(x) and B(x) are two power series over a field, and ord(B(x)) = 0, then we can compute A(x)/B(x) mod xn by first computing

C(x)=1/B(x) mod xn and then A(x)C(x) mod xn .

27 Division of polynomials If A(x) and B(x) are two polynomials of degree (at most) n over a field, and B(x) = 0, then we 6 can find polynomials Q(x) and R(x) such that

A(x) = Q(x)B(x) + R(x) , satisfying the condition

deg(R(x)) < deg(B(x)) .

It follows that

deg(Q(x)) = deg(A(x)) deg(B(x)) . − There is a straightforward algorithm (Knuth, 4.6.1, Algorithm D) which takes time O(n2). § In fact, this is very similar to the “” process which you (may have) learned at school for dividing one integer by another.

28 Reduction to triangular system We are given A(x), B(x) and want to compute Q(x) and R(x) such that deg(R) < deg(B) and

A(x) = Q(x)B(x) + R(x) ,

If deg(B) > deg(A) we can take Q = 0 and R = A, so suppose that

deg(B) = m deg(A) = n. ≤ Let k = n m = deg(Q). Equating coefficients of xn,...,x−m we get a triangular system of equations for the coefficients of Q(x):

bm qk an bm−1 bm qk−1 an−1  . . .   .  =  .  ......        bm−k ... bm−1 bm   q0   am              (Here we interpret bj =0 if j < 0.) Solving the triangular system to compute Q(x) takes O(k2) operations, and then we can compute R(x) from R(x) = A(x) Q(x)B(x). −

29 Using power series We can do better by reducing division of polynomials to division of power series, and using the fact that this can be done quickly by Newton’s method (for reciprocals) and Karatsuba’s method (or the FFT) for multiplication. Let y =1/x so (with k,m,n as before) we have

ynA(1/y) = a + a y + + a yn , n n−1 ··· 0 ymB(1/y) = b + b y + + b ym , m m−1 ··· 0 ykQ(1/y) = q + q y + + q yk . k k−1 ··· 0 Thus, if we regard ynA(1/y) and ymB(1/y) as power series in y we can compute ykQ(1/y) by power series division:

ynA(1/y) ykQ(1/y) = mod yk+1 ymB(1/y) and then find R by subtraction as before.

30 Reducibility and equivalence We say that problem is reducible to problem A if an algorithm for the solution of can be B B used to solve . Some conditions have to be A imposed on the size of the problems and the overhead in making the reduction, but I shall not be specific here. If is reducible to and A B is reducible to then we say that and B A A B are equivalent. Suppose problem can be solved in time A(n) A (for inputs of size n), and problem can be B solved in time B(n). We say that problems A and are computationally equivalent if B A(n)=Θ(B(n)) .

Recall that this means A(n) = O(B(n)) and B(n) = O(A(n)), so we can say that, apart from constant factors, the time required to solve problems and is the same. A B

31 Notation As a shorthand notation which makes clear that we are talking about an equivalence relation, we write A(n) B(n) ≈ if A(n)=Θ(B(n)). Warning: this is not a standard notation. Other symbols which we might use include

, , , =, ∼ ≃ ≍ ∼ ⇔ Remark: we already considered equivalence of various matrix algorithms.

32 Definition of M, S, R and D Let F be a field (with characteristic = 2). We 6 consider power series P (x),Q(x) F [x]. We are ∈ interested in the time required to perform operations such as multiplication, squaring, and division mod xn on such power series. Let M(n) be the time required to form P (x)Q(x) mod xn. Let S(n) be the time required to form P (x)2 mod xn (this is the case P = Q of the above). Let R(n) be the time required to form 1/Q(x) mod xn for ord(Q)=0. Let D(n) be the time required to form P (x)/Q(x) mod xn for ord(Q)=0.

33 A regularity assumption Property B: We say that f(n) satisfies Property B if f(n) is positive, monotonic non-decreasing, i.e.

m n 1 f(m) f(n) > 0 , ≥ ≥ ⇒ ≥ and there exist constants α,β (0, 1) such that ∈ f( αn ) βf(n) (1) ⌈ ⌉ ≤ for all sufficiently large n. Condition (1) holds if f(n) na(log n)b(log log n)c for some constants ∼ a > 0, b and c, so it is not very restrictive. We shall assume that M(n) satisfies Property B.

34 Plausibility argument The assumption that M(n) is positive (for n 1) and non-decreasing is extremely ≥ plausible.

Suppose P0,P1,Q0 and Q1 are polynomials of degree n 1. We can compute − 2n 2n (P0 + x P1)(Q0 + x Q1) in time M(6n), but from the result we can “pick out” P0Q0 and P1Q1. Thus, it is plausible that

2M(n) M(6n) . ≤ (Why isn’t this a rigorous proof ?) Replacing n by n/6, we get the condition of Property B with α =1/6,β =1/2.

35 A simpler assumption Property W: We way that f(n) satisfies Property W if f(n) is positive, monotonic non-decreasing, and

f(2n) 2f(n) ≥ for all sufficiently large n. We could assume that M(n) satisfies Property W. This is simpler (though perhaps slightly less plausible) than assuming Property B. Exercise: Property W Property B. ⇒

36 A useful lemma Lemma: If f(n) satisfies Property B and 0 < c < 1, then

f( ckn ) = O(f(n)) . k ⌈ ⌉ k≥0X, c n≥1

Proof: Let α and β be as in the definition of Property B. Since 0 < c < 1, there is a positive K such that cK α. ≤ Thus

f( ckn ) k ⌈ ⌉ k≥0X, c n≥1 K f( αjn ) + O(1) ≤ ⌈ ⌉ Xj K βjf(n) + O(1) ≤ Xj K f(n) + O(1) ≤ 1 β  −  = O(f(n)) .

⊓⊔

37 Properties of M By the Lemma and our assumption that M(n) satisfies Property B, we have

⌊lg n⌋ M( n/2k ) = O(M(n)) . ⌈ ⌉ kX=0 By definition of M(n), it is clear that

M(n)=Ω(n) .

Using the identity

n 2n 3n (P0 + x P1 + x P2 + x P3) n 2n 3n (Q0 + x Q1 + x Q2 + x Q3) × n 4n = P0Q0 + x (P0Q1 + P1Q0) + ... mod x we have

M(4n) 16M(2n) + O(n) = O(M(2n)) . ≤ Thus, for any fixed c > 0,

M(cn) M(n) . ≈

38 Equivalence of some operations Theorem: Under the assumption that M(n) satisfies Property B,

M(n) S(n) R(n) D(n) . ≈ ≈ ≈

Proof: The proof has three main steps. 1. We have 1 1 xnP (x) = x2nP (x)2 + O(x3n) 1 xnP (x) − − − so S(n) R(3n) + O(n) . ≤ 2. Using the Newton iteration

k Q (x) = Q (x)(2 P (x)Q (x)) mod x2 k k−1 − k−1 k we can compute 1/P (x) mod x2 in time

2M(2k)+2M(2k−1) + + O(n) , ··· and by the Lemma above this is O(M(2k)). We can choose k such that 2k−1 n 2k, and ≤ ≤ M(2k) M(2n) = O(M(n)) . ≤

39 Thus, we have

R(n) = O(M(n)) .

This implies that R(3n) = O(M(3n)), but M(3n) M(n), so ≈ R(3n) = O(M(n)) .

3. Since 4PQ =(P + Q)2 (P Q)2, we have − − M(n) 2S(n) + O(n) . ≤ (Note: the assumption that char(F ) =2 is 6 essential here.) From parts 1–3 we have S(n) = O(R(3n)), R(3n) = O(M(n)), and M(n) = O(S(n)). Thus

S(n) R(3n) M(n) . ≈ ≈ Also, because M(n) M(3n), we have ≈ R(3n) M(3n), so R(n) M(n). ≈ ≈ To complete the proof, note that

R(n) D(n) R(n) + M(n) , ≤ ≤ so D(n) R(n). ≈ ⊓⊔

40 Other equivalences Our results have been expressed in terms of operations on power series mod xn; they can also be expressed in terms of operations on polynomials of degree n.

Multiple-precision operations Similar results hold for arithmetic operations on n-digit numbers: the operations of multiplication, squaring, finding reciprocals, and division are all computationally equivalent (in the sense defined above, i.e. ignoring constant factors). The proofs are similar to those for power series. Exercise: Consider other operations such as computing square roots, logarithms, exponentials, and powers (where they are well-defined), for both power series and n-digit numbers.

41 3. Finite fields & modular arithmetic

We now consider some basic properties of finite fields, the use of modular arithmetic, the Chinese remainder theorem, and some applications. Reference: CLRS, Chapter 31.

Fields Recall that a field is a set with two F operations, conventionally written as “+” (addition) and “ ” (multiplication), satisfying × the following properties:

1. ( , +) is an Abelian (i.e. commutative) F group (called the “additive group” of the field). We write the zero element of this group as 0.

2. ( 0 , ) is an Abelian group (called the F\{ } × “multiplicative group” of the field).

3. The distributive law holds, i.e. (a + b) c =(a c)+(b c). × × ×

42 Comments Property 3 means that ( , +, ) is a ring. F × However, in general a ring does not satisfy property 2. For definitions of Abelian groups etc, see any good book on modern algebra, e.g. van der Waerden, Modern Algebra (first published in 1931 – the most recent edition is called simply Algebra).

Familiar examples You are probably familiar with three fields: the fields Q, R, C of rational, real and complex numbers (respectively). These are infinite fields – Q is countable, R and C are uncountable. Other examples are finite extension fields such as Q[√2], and the field of algebraic numbers (roots of polynomials over Q).

43 Notation (nothing surprising) We may omit the symbol “ ” and write ab for × a b. Also, we assume that “ ” has higher × × precedence than “+”, so we can write ac + bc instead of (a c)+(b c). × × The identity element of the additive group is written as 0, so a +0=0+ a = a for all a . ∈F The additive inverse of a is written as a. − The identity element of the multiplicative group is written as 1, so a 1=1 a = a for all × × a . The multiplicative inverse of a =0 is ∈F 6 written as a−1 or 1/a, and b a−1 is written as × b/a, etc. If n is a positive integer, we write na for n a + a + + a, ··· 0a for 0, and ( zn)a for}| (na),{ etc. − − Similarly, we write an for n a a a, × ×···× and if a = 0 we writez a}|−n for {a−1 n = (an)−1. 6  44 Finite fields We are interested in finite fields, i.e. fields such that is finite. One reason is that it is easy |F| to represent elements of such fields in (finite) computer words, without any approximation or truncation error. is called the order of . (Another notation |F| F is # .) There is a well-known Theorem (which F is not too difficult to prove, but it would take us too far afield), which characterises the possible orders of finite fields. Theorem. Let be a field with finite order q. F Then q is a prime power, i.e. q = pn for some prime p and nonnegative integer n. Moreover, for any prime power q, there exists a finite field with order q. F ⊓⊔ There is essentially only one field of any given prime power order q: any two such fields are the same up to isomorphism. The field is called the Galois field of order q and denoted by GF(q).

45 GF(p) For simplicity, we are only going to consider Galois fields of prime order, i.e. the case n = 1, q = p. In this case GF(p) is essentially the same as the set of residues 0, 1,...,p 1 with the { − } natural operations of addition and multiplication mod p, usually written as Z/pZ.

Warning. GF(pn) is not isomorphic to Z/pnZ if n > 1. In fact, it is easy to see that Z/pnZ is not a field in this case – it is only a ring.

Examples The smallest finite field is GF(2), which consists of 0, 1 with addition and multiplication mod 2 { } (so addition is “exclusive or” and multiplication is “and” if we regard 0 as “false” and “1” as “true”). Although this field is almost trivial, there are many applications of polynomials over GF(2). A less trivial example is GF(5). Exercise: Write out the addition and multiplication tables for GF(5).

46 The characteristic of a field If there is a positive integer p such that pa =0 for all a , then the least such p is called the ∈F characteristic of , denoted char( ). F F Otherwise, we say that the characteristic is zero. The Galois field GF(pn) has characteristic p. Infinite fields can have characteristic zero or nonzero, but the familiar examples of Q, R, C all have characteristic zero. Exercise: If m is not a multiple of char( ), F and y , then there exists a unique x ∈F ∈F such that mx = y . Naturally we write x = y/m in this case. Hint: Consider m 1 and its inverse. ×

47 Fermat’s (little) theorem Fermat has many “theorems”. These include his “big” theorem (proved by Wiles et al) that an + bn = cn has no solutions in positive integers a, b, c for n > 2. (Fermat claimed to have a proof, but it was too large to fit in the margin of the book he was annotating.) Of more interest to us is Fermat’s “little” theorem: Theorem. If p is a prime and 0

48 Structure of GF(p) The multiplicative group of GF(p) has order p 1 and there may be several Abelian groups − with this order. For example, if p = 5, there are two non-isomorphic groups of order p 1=4, − Z and Z Z . (Here Z = Z/4Z etc.) 4 2 × 2 4 Gauss proved that the multiplicative group of GF(p) is cyclic, i.e. it is generated by a single element. (The same is true for GF(q).) Considering arithmetic modulo p, this means that there is a primitive root a such that

for 1 m < p 1, am = 1 mod p ≤ − 6 but (as we know from Fermat’s little theorem)

ap−1 = 1 mod p.

The existence proof is not constructive – it does not give an algorithm to find a primitive root. We can do this by trial and error, or by systematically checking a =2, 3,..., but it is not clear how long this will take. (How large is the smallest primitive root mod p ?)

49 Testing a primitive root To test if a is a primitive root modulo p, it is not necessary to check am mod p for all m [1, p 2]. ∈ − Exercise: Show that a is a primitive root modulo p iff ap−1 =1 mod p and a(p−1)/r =1 mod p for all prime factors r of 6 p 1. − A simple Suppose we want to test p for primality, but it is too large to test by . If we can find an integer a satisfying the above conditions, then p must be prime. This is not always a practical test because it requires that we factorise p 1, but it does allow − us to give “succinct certificates” of primality (i.e. short proofs which are easily checked).

50 Computing powers mod p A problem which often arises is: given p, a [0, p 1], and n > 0, compute ∈ − b = an mod p. In applications both p and n may be large (e.g. 1024-bit numbers) so we need an efficient algorithm. A good general rule, when computing mod p, is to perform a reduction to the range [0, p 1) − after each multiplication. Otherwise the intermediate results may grow extremely large. For example, when computing 5p−1 mod p where p is a large prime, we would not compute 5p−1 and then find the remainder on division by p. When computing b = an mod p for large n and p, we would not perform n 1 multiplications. − It is possible to obtain the result with only O(log n) multiplications, by making use of the binary representation of n.

51 Binary powering algorithm power(a, n, p) u a; ← b 1; ← while n > 0 do begin if odd(n) then b b u mod p; ← × u u u mod p; ← × n n div 2; ← end; return b. Exercise: Prove that power(a, n, p) return an mod p (check exactly what happens if a =0 or n = 0). If the numbers a, n, p are at most 2t, show that the time required is O(t3), or less if a fast t-bit is used.

52 Computing inverses mod p Given a nonzero element a GF(p), we know ∈ that the inverse a−1 exists and is unique, but how can we compute it efficiently ? Solution 1. From Fermat’s little theorem, a−1 = ap−2 mod p, and we have seen that the right hand side can be computed efficiently. Solution 2. Apply the extended to (a, p). This gives λ, µ such that λa + µp = GCD(a, p) but (because p is prime) GCD(a, p)=1, so λa =1 µp =1 mod p, − and we see that λ = a−1. Exercise: Compare the work required by the two methods if p is an n-bit number and n is large. (The more or less “obvious” algorithms require O(n3) and O(n2) operations resp., so we expect solution 2 to be faster than solution 1.) Exercise: Show method 2 also works for composite p, provided GCD(a,p) = 1.

53 Chinese remainder theorem (CRT) To avoid working with large numbers, we can often choose a set of moduli n , n ,...,n and { 1 2 k} perform the computation modulo each of the moduli separately. Provided the moduli are pairwise relatively prime, i.e. GCD(ni, nj)=1 for 1 i

Theorem. Let ni be as above, mi = n/ni, −1 di = mi mod ni, and ci = midi. Suppose residues ai are given. Then the set of equations

x = ai mod ni, i =1, 2,...,k (2)

has a unique solution modulo n. The solution is given by

x = a c + a c + + a c mod n. (3) 1 1 2 2 ··· k k

54 Example of CRT Before proving the CRT, it may be helpful to try a small but not quite trivial example. Let n =2 3 5 7 = 210 and x = 123. Thus · · ·

(n1, n2, n3, n4) = (2, 3, 5, 7) ,

(a1, a2, a3, a4) = (1, 0, 3, 4) ,

(m1,m2,m3,m4) = (105, 70, 42, 30) ,

(d1,d2,d3,d4) = (1, 1, 3, 4) ,

(c1,c2,c3,c4) = (105, 70, 126, 120) , and the Theorem gives

x = 1 105+0 70+3 126+4 120 × × × × = 963 = 123mod210 which is correct ! Exercise: Check my arithmetic and/or construct your own examples.

55 Outline proof of the CRT A proof is given in CLRS (Thm. 31.27), so we give an outline here. First, observe that m = 0 mod n if i = j; it follows that j i 6 c = 0 mod n if i = j, so if x satisfies (3) then j i 6

x mod ni = aici mod ni .

However,

−1 ci mod ni = mi(mi ) mod ni = 1 mod ni , so x = ai mod ni , i.e. (2) holds. This proves the existence of a solution satisfying (2). Note the analogy with Lagrange interpolation. To prove uniqueness modulo n of the solution, suppose there are two solutions, x′ and x′′. Considering x = x′ x′′, we can restrict − attention to the case ai = 0. However, in this case n x for each i, so LCM(n ,...,n ) x, i | 1 k | i.e. n x, so x is unique modulo n. | ⊓⊔

56 Preconditioning Suppose we want to apply the CRT many times with the same set of moduli (n1,...,nk). We only have to compute the constants (c1,...,ck) once. Thus, we save work every time we solve an equation of the form (2), except for the first time. The computation (and saving for future use) of (c1,...,ck) is an example of preconditioning.

Generalisation to polynomials The CRT can be generalised to handle moduli which are relatively prime polynomials rather than integers (which can be regarded as polynomials of degree 0). See Aho, Hopcroft and Ullman, Theorem 8.13.

57 FFT over a finite field To avoid problems with rounding errors when working with the “classical” FFT, which involves n-th roots of unity and irrational numbers (except in the trivial cases n =1, 2, 4), we may choose to work over one or more finite fields GF(p). (If more than one, we may be able to use the CRT to obtain the final answer.) In order to apply the FFT over n points, we need n-th roots of unity. In the field GF(p), the multiplicative group G is cyclic with order p 1. Suppose w is a primitive n-th root of − unity, i.e. wn = 1 and n is the least positive integer for which this holds. The subgroup H = generated by w consists of the n elements w, w2, w3,...,wn−1, wn . { } A well-known theorem in finite group theory (Lagrange’s theorem) implies that H is a | | divisor of G , i.e. n (p 1). Thus, the only | | | − possible n-th roots of unity are for n a divisor of p 1. Since G is cyclic, any divisor of p 1 is in − − fact possible.

58 Primes in arithmetic progressions Suppose we want to apply a radix-2 FFT algorithm such as the Cooley-Tukey algorithm. Then n =2k for some k (the depth of recursion) and we need a prime p of the form λ2k + 1. Fortunately, by a theorem of Dirichlet (extending the “prime number theorem”), there are an infinite number of primes in each arithmetic progression αm + β m 0 , { | ≥ } provided GCD(α,β) = 1. (This condition is obviously necessary.) Applying Dirichlet’s theorem with α =2k, β = 1, we see that there are an infinite number of primes we can use. (The question of how large the smallest such prime can be is interesting, but in practice it is not a problem.)

59 4. Number-theoretic algorithms

In this lecture (maybe two) we consider algorithms for testing primality, integer factorisation, and some applications such as public-key cryptograpy. Many of the best algorithms are Monte Carlo or Las Vegas algorithms (CLRS 5.3). § Recommended reading: CLRS Ch. 31; Knuth, Vol. 2, 4.5.4; § Motwani and Raghavan (1995), Ch. 14; Riesel, parts of Chs. 4–7.

60 Some Recall Fermat’s little theorem: Theorem. If p is a prime and 0

ap−1 = 1 mod p.

Suppose p is an odd prime. From Fermat’s little theorem,

a(p−1)/2 = 1 mod p ; ± To determine the sign, we need to consider quadratic residues. We say that a is a mod p if there is an x such that

a = x2 mod p ; otherwise a is a quadratic nonresidue mod p. If p is an odd prime then, because (+x)2 =( x)2, we see that exactly half of the − numbers 1, 2,...,p 1 are quadratic residues − mod p.

61 Euler’s criterion The following result is useful for determining if a number is a quadratic residue. Theorem (Euler’s criterion). If p is an odd prime and 0

a(p−1)/2 = 1 mod p.

Proof. If a is a quadratic residue, say a = x2 mod p, then a(p−1)/2 = xp−1 = 1 mod p by Fermat’s little theorem. Conversely, suppose a is not a quadratic residue. Let g be a primitive root mod p. Thus a = gk mod p for some k. If k is even then x = gk/2 mod p satisfies x2 = a mod p, contradicting our assumption. Thus k is odd, say k =2m + 1. Then

a(p−1)/2 = gm(p−1)+(p−1)/2 = g(p−1)/2 mod p.

Now g(p−1)/2 = 1 mod p because g has order 6 p 1. − ⊓⊔

62 Finding a quadratic nonresidue Using Euler’s criterion, we obtain a very simple Las Vegas algorithm for finding a quadratic nonresidue mod p (where, as usual, p is an odd prime). repeat choose random a 1, 2,...,p 1 . ∈{ − } until a(p−1)/2 = 1 mod p; − return a. The probability of success at each step is exactly 1/2, so the expected number of times the loop is repeated is 2. Note that we can not prove that the algorithm terminates. It is conceivable that it will keep choosing quadratic residues and never find a nonresidue. However, the probability of choosing k quadratic residues in a row is 2−k, and the probability that the algorithm never terminates is zero. Thus, for practical purposes, we can have confidence that the algorithm will terminate.

63 Testing primality – Algorithm RM Suppose we want an algorithm to determine if a given odd positive integer n is prime. The following Monte Carlo algorithm is due to Rabin, with improvements by Miller. Its expected run time is polynomial in log n.

1. Write n as 2kq + 1, where q is odd and k > 0. 2. Choose a random integer x 2,...,n 1 . ∈{ − } 3. Compute y = xq mod n. This can be done with O(log q) operations mod n, using the binary representation of q. 4. If y = 1 then return “yes”. 5. For j =1, 2,...,k do if y = n 1 then return “yes” − else if y = 1 then return “no” else y y2 mod n. ← 6. Return “no”.

64 Explanation of Algorithm RM A slight extension of Fermat’s little Theorem is useful, because its converse is usually true. If n =2kq +1 is an odd prime, then either xq = 1 mod n, or the sequence

j x2 q mod n j=0,1,...,k   ends with 1, and the value just preceding the first appearance of 1 must be n 1. − Proof: If y2 = 1 mod n then n (y 1)(y + 1). | − Since n is prime, n (y 1) or n (y + 1). | − | Thus y = 1 mod n. ± ⊓⊔ The extension gives a necessary (but not sufficient) condition for primality of n. Algorithm RM just checks if this condition is satisfied for a random choice of x, and returns “yes” if it is. If the answer is “no” then we say that x is a witness to the compositeness of n. Fortunately, witnesses are common (there are at least 3n/4 of them) so they are easy to find.

65 Reliability of Algorithm RM Algorithm RM can not give false negatives (unless we make an arithmetic mistake), but it can give false positives (i.e. “yes” when n is composite). However, the probability of a false positive is less than 1/4. Usually it is much less – see Knuth, ex. 4.5.4.22. A weaker result, with 1/4 replaced by 1/2, is proved in CLRS 31.8, Theorem 31.38. § If we repeat the algorithm 10 times there is less than 1 in 106 chance of a false positive, and if we repeat 100 times the results should satisfy anyone but a pure mathematician. Algorithm RM works well even if the input is a Carmichael number.

Use of randomness Note that in our examples randomness was introduced into the algorithms. We did not make any assumption about randomness of the inputs.

66 Summary of Algorithm RM Given any ε > 0, we can check primality of a number n in

O((log n)3 log(1/ε)) bit-operations, provided we are willing to accept a probability of error of at most ε. By way of comparison, the best known deterministic algorithm takes

O((log n)6) bit-operations, and is much more complicated. This is the AKS algorithm1, not mentioned in CLRS, with an improvement by Lenstra to reduce the exponent.

1Agrawal, Kayal and Saxena, Annals of Mathematics 160 (2004), 781–793.

67 Factorisation algorithms The Rabin-Miller primality test is a Monte Carlo algorithm, because it can occasionally give the wrong answer (claiming that a composite number is prime). There are several randomised algorithms for factoring integers. Because it is easy to check if a factorisation is correct (by multiplying the supposed factors), we can easily convert these algorithms into Las Vegas algorithms – they will never return the wrong answer, but the time taken to determine a correct answer is random. Examples are Pollard’s “rho” (ρ) method, Lenstra’s elliptic curve method (ECM), the multiple-polynomial (MPQS), and the number field sieve (NFS). We shall look at simplified versions of Pollard’s rho method, Pollard’s “p 1” method, ECM − (very briefly) and MPQS.

68 Pollard rho Suppose we want to find a prime factor q > 3 of an odd composite integer N. Take a random x , and a polynomial P Z[x] 0 ∈ of the form P (x) = x2 + a, where a =0, 2. Define a sequence (x ) by 6 − j x = P (x ) mod N, j 1 , j j−1 ≥ and the “doubled” sequence (yj) by

yj = x2j .

Note that we can compute (yj) using the recurrence

yj = P (P (yj−1)) mod N.

69 Simple version of rho The simplest form of Pollard’s “rho” algorithm is j 0; ← x x ; y x ; ← 0 ← 0 repeat j j + 1; ← x P (x) mod N; ← y P (P (y) mod N) mod N; ← f GCD(x y, N) ← − until f > 1; return f. The algorithm may fail because it returns f = N. However, this is unlikely. Usually, if q is the smallest prime factor of N, the algorithm returns f = q in O(√q) steps. To understand why, consider the sequence ′ (xj = xj mod q). Note that

′ ′ xj = P (xj−1) mod q because q N and P (x) is a polynomial with | integer coefficients.

70 Pollard rho continued ′ The sequence (xj) must repeat after at most q steps, i.e. there is some period p q such that ≤ ′ ′ xn+p = xn for all n t. The integer t is the length of the ≥ nonperiodic part of the sequence (possibly t = 0). By a “tail chasing” argument, there is an n = O(q) such that

′ ′ x2n = xn but this means that p n and q (y x ). | | n − n Although the worst case is Ω(q), the “birthday paradox” argument shows that we can expect n = O(√q) if P behaves like a random function. In practice, this seems to be the case. For further details (and a more efficient version of the algorithm), see CLRS 31.9. §

71 Example Using a slightly modified version, Pollard and I found the factor

q = 1238926361552897

of the Fermat number

28 F8 =2 +1 .

You can remember this factor by the epigram

I am now entirely persuaded to employ the method, a handy trick, on gigantic composite numbers

72 Pollard’s “p 1” method − The Pollard “p 1” method is interesting − because it is the basis of the elliptic curve method, which we consider later. The p 1 method depends on Fermat’s little − theorem. We want to find a prime factor p of a number N. Suppose that (somehow) we know or guess a number E such that p 1 is a divisor − of E. Choose some nonzero a, 1

g = GCD(aE 1, N) . − By Fermat’s little theorem, ap−1 = 1 mod p so aE = 1 mod p, i.e. p is a divisor of aE 1, so p − is also a divisor of g. Thus, unless we are unlucky and find the “trivial” result g = N, we get a nontrivial divisor of N. How do we find E? We can take E to be a product of all prime powers q less than some bound B. If B is sufficiently large, the method will work. In fact, B has to be at least as large as the largest prime factor of p 1. −

73 The elliptic curve method (ECM) The Pollard rho algorithm is an example of a factorisation algorithm whose expected running time depends mainly on the size of the factor f which is found, and only secondarily on the size of the number N which is being factored (only because arithmetic operations are performed mod N). The expected running time is Θ(√f). Pollard rho is not the best such algorithm. A more sophisticated algorithm is Lenstra’s elliptic curve algorithm/method (usually abbreviated ECM). This is a randomised algorithm which finds a factor f in expected time

O(exp((1 + ε) 2 ln f lnln f)) , p where ε 0 as f . → →∞

74 ECM continued Because √f = exp((ln f)/2) and

ln f 2 ln f lnln f 2 ≫ p for large f, ECM is faster than Pollard rho for large factors f. The crossover point is for factors of about 12 decimal digits (so Pollard rho is a useful method for finding “small” factors). ECM is useful for finding factors of up to about 40 decimal digits, and if you are lucky you might find larger ones (the current record is 73 decimal digits). ECM is a randomised algorithm. It uses a number of “trials”, where each trial depends on choosing a random group (defined by an elliptic curve) and then trying to find a factor by an analogue of the Pollard “p 1” algorithm. − The details of ECM are outside the scope of this course. If you are interested, look in the book by Riesel or in some of the papers accessible from my home page.

75 Pseudo-deterministic algorithms Some randomised algorithms use many independent random numbers, and because of the “law of large numbers” their performance is very predictable. One example is the multiple-polynomial quadratic sieve (MPQS) algorithm for integer factorisation. Suppose we want to factor a large composite number N (not a perfect power). The key idea of MPQS is to generate a sufficiently large number of relations of the form

y2 = pα1 pαk mod N, 1 ··· k where p1,...,pk are small primes in a precomputed “factor base”, and y is close to √N. Many y are tried, and the “successful” ones are found efficiently by a sieving process. Finding a relation is a fairly rare event, but we have to find many (as many as there are primes in the factor base), so by the law of large numbers it is relatively easy to predict how long it will take.

76 Combining relations After a sufficient number of relations have been found, we solve a system of linear equations (or, more accurately, find a linear dependency) over GF(2) to obtain a relation where all the exponents are even. This gives us a relation of the form y2 = z2 mod N, and we check GCD(y z, N). With probability − at least 0.5, this gives a nontrivial factor of N. (If it fails, we find another linear dependency and try again ...) Making some plausible assumptions, the expected run time of MPQS is

T = O(exp( c log N log log N)), p where c 1. In practice, this estimate is good ≃ and the variance is small.

77 MPQS example MPQS is currently the best general-purpose algorithm for factoring moderately large numbers N whose factors are in the range N 1/3 to N 1/2. For example, Lenstra and Manasse found

3329 +1 = 22 547 16921 256057 · · · · 36913801 177140839 · · 1534179947851 p p , · 50 · 67 where the penultimate factor p50 is a 50-digit prime 24677078822840014266652779036768062918372697435241, and the largest factor p67 is a 67-digit prime. The computation used a network of workstations for “sieving”, then a super- computer for the solution of a very large linear system.

78 MPQS and NFS MPQS has been used to factor numbers of up to 129 decimal digits, although a more sophisticated method (the number field sieve, NFS) is faster for numbers of more than about 110 decimals. It is now feasible to factor 232-digit (768-bit) numbers by NFS using a network of workstations.

Public key cryptography Integer factorisation is of interest in cryptography because the security of the popular RSA (for Rivest, Shamir and Adleman) algorithm for public-key cryptography depends on the difficulty of factoring large integers (say products of 150-decimal digit primes). We give a brief outline, omitting some of the practical details. See CLRS 31.7 (or a recent book on § cryptography, such as those mentioned below) for a more detailed description of RSA.

79 Setting up RSA To set up an RSA scheme, Alice chooses two large primes p and q. She computes n = pq and φ =(p 1)(q 1). She chooses a random − − integer e, 1

de = 1 mod φ. n is the modulus, e is the encryption exponent, and d is the decryption exponent. Alice’s public key is (n, e) and her private key is d. The numbers p, q and φ should be kept secret (they may be discarded as they are no longer needed, although sometimes p and q are kept for efficiency reasons).

80 RSA encryption/decryption For Bob to send a message m to Alice, he obtains her public key (n, e). We suppose that m is an integer in the range n1/3

c = me mod n.

The ciphertext is c. To decrypt the ciphertext and retrieve the plaintext m, Alice computes

m = cd mod n.

This works because mφ+1 = m mod n, so

cd = med = m mod n.

81 Breaking RSA If an eavesdropper (Eve) can factor n, then she can compute d and break the system. There is no obvious way to do this without factoring n (although it has not been proved that the problems are equivalent).

82 Discrete logarithms The problem is to find an integer x such that

gx = b mod n, where g, n and b are given integers. We write x = logg b. The solution (if it exists) is not unique, so we assume that logg b is the smallest non-negative solution. If n is a large prime and n 1 has at least one − large prime factor, then the discrete logarithm problem seems to be difficult (at least as difficult as factoring integers of about the same size as n). The concept of discrete logarithm can be generalised to other algebraic structures. We are just considering the simplest case here.

83 Diffie-Hellman key exchange Suppose Bob and Alice want to send messages to each other using ordinary (not public-key) cryptography. They need to agree on a key K to use for encrypting/decrypting their messages. This may be difficult if they are communicating by phone or email and someone (Eve) is eavesdropping. Diffie and Hellman suggested a nice solution. First, Bob and Alice agree on a large prime p and an element g which is a primitive root mod p. Preferably q =(p 1)/2 should be − prime. Bob/Alice can find suitable p and q using Algorithm RM, and then find g by a randomized algorithm. Testing that g is a primitive root is made easy because the factorisation of p 1 is known. − Bob and Alice can make p and g public. It does not matter if Eve knows them.

84 Diffie-Hellman continued The Diffie-Hellman algorithm for generating a key K known to Bob and Alice, but not to Eve, is as follows. 1. Alice chooses a random x 2,...,p 2 , ∈{ − } computes X = gx mod p, and sends X to Bob. 2. Bob chooses a random y 2,...,p 2 , ∈{ − } computes Y = gy mod p, and sends Y to Alice. 3. Alice computes K = Y x mod p. 4. Bob computes K = Xy mod p. Now both Alice and Bob know K = gxy mod p, so it can be used as a key or transformed into a key in some agreed manner. Eve may know p, g, X and Y . However, she does not know x or y. Although it has not been proved, it seems that she can not compute K = Xy mod p = Y x mod p without effectively finding x or y, and this requires solving a discrete logarithm problem.

85 Encryption using discrete logarithms There is a public-key encryption scheme, the El Gamal scheme, which depends on the difficulty of computing discrete logarithms, and is closely related to the Diffie-Hellman key exchange algorithm. For a complete description of the El Gamal algorithm, see

B. Schneier, Applied Cryptography, 2nd • edition, 19.6, or § A. Menezes, P. van Oorschot and • S. Vanstone, Handbook of Applied Cryptography, 8.4 § (we refer to this book as MOV).

86 El Gamal encryption/decryption For public key encryption/decryption, Alice chooses a large random prime p and a primitive root g. She also chooses a random exponent x, 2 x p 2, and computes X = gx mod p. ≤ ≤ − Alice’s public key is (p,g,X) and her private key is x. For Bob to send a message m to Alice, he obtains Alice’s public key (p,g,X). We assume that m is an integer in the range 0 m < p ≤ (otherwise, split m into pieces, encode them as integers, and send each piece separately). Bob chooses a random integer y, 2 y p 2, ≤ ≤ − and computes Y = gy mod p and Z = mXy mod p. The ciphertext is (Y, Z). To decode the ciphertext (Y, Z), Alice uses the fact that (with all computations in GF(p)) Xy = gxy = gyx = Y x , so m = ZY −x .

87 Notes on El Gamal 1. Alice and Bob have essentially used the Diffie-Hellman scheme to agree on a key gxy. The only difference is that Alice publishes X as part of her public key, and it does not change.

2. It is very important for Bob to choose a different random y each time.

3. There is an El Gamal scheme for signatures, but it is a little more complicated than the scheme for encryption/decryption.

4. El Gamal is a good alternative to RSA, especially if you have doubts about the difficulty of integer factorisation.

5. For added security (or to permit shorter keys) El Gamal can be generalised to use discrete logarithms over elliptic curves or other groups. This is the basis for elliptic curve cryptography (ECC).

88 Related topics There is no time to cover the following topics in detail, but if you are interested you can find a discussion of them and further references in 3.6 § of the book MOV mentioned above.

Baby step, giant step method of Shanks • for discrete logs.

Pollard rho method for discrete logs. This • is an alternative which uses less space than the method of Shanks.

Pohlig-Hellman reduction. This is a way • of reducing a discrete log problem mod n to a sequence of discrete log problems mod p, where p ranges over the prime factors of n.

Implication of these algorithms for the • El Gamal cryptosystem: the modulus n (or, in generalisations, the group order) should have at least one large prime factor.

89 Secret sharing Many years ago, the University of Oxford’s cash was stored in a large chest which had several locks, and no one person could open all the locks. In more modern times, there might be eight directors of a company, and at least three of them might be required to sign cheques above a certain amount. More generally, suppose that 1 t n and ≤ ≤ there are n people who want to “distribute” a secret so that at least t of them need to cooperate to reconstruct the secret. A way of implementing this is called a “(t, n) threshold scheme”. If t = n, an obvious way to do this is to split the secret into t pieces and give one piece to each person. However, this is not very good, because if t 1 people get together, they might − be able to guess the remaining piece. Also, it is not clear how to generalise to the case t < n.

90 Shamir’s threshold scheme Shamir proposed a (t, n) threshold scheme based on polynomial interpolation. Suppose the secret is an integer S 0. We choose a prime ≥ p > max(S, n) and define a0 = S. We choose random, independent values a , a ,...,a GF(p). The coefficients 1 2 t−1 ∈ a0,...,at−1 define a polynomial

t−1 j f(x) = ajx . jX=0

We compute Si = f(i) mod p for i =1,...,n. Then person i is given the “share” (i, Si).

91 Reconstructing the secret If any t people pool their shares, their values of f(x) at t distinct points define f(x) uniquely, so they can find S0 = f(0). (All this is in GF(p), or, if you prefer, just consider values mod p.) Recall the Lagrange interpolation formula for a polynomial interpolating t points (xi, yi):

t x xj f(x) = yi − . xi xj Xi=1 jY6=i − where the product is over j in the range 1 j t, j = i. Because the computation is ≤ ≤ 6 done in GF(p), there is no problem with rounding errors.

92 Properties of Shamir’s scheme Shamir’s scheme has some desirable properties:

Given knowledge of any t 1 or a smaller • − number of shares, all values 0 S < p of ≤ the shared secret remain equally probable.

The size of one share is close to the size of • the secret.

New shares may be computed without • changing existing shares.

A single person can be given more than • one share if desired.

There are no unproven assumptions • about the difficulty of solving some number-theoretic problem.

93 5. Annotated list of References

A. V. Aho, J. E. Hopcroft and J. D. Ullman, The Design and Analysis of Computer Algorithms, Addison-Wesley, 1974. Rather old (as Computer Science books go) but still relevant. See Ch. 6 for matrix algorithms, Ch. 7 for the FFT and its applications, and Ch. 8 for integer and polynomial arithmetic. A. V. Aho, J. E. Hopcroft and J. D. Ullman, Data Structures and Algorithms, Addison-Wesley, 1983. A revised and more elementary version of the first six chapters of the 1974 book by the same authors. [CLRS] T. H. Cormen, C. E. Leiserson, R. L. Rivest and C. Stein, Introduction to Algorithms, third edition, MIT Press, 2009. The main textbook for this course. In earlier editions some of the chapter numbers differ.

94 References continued

D. E. Knuth, The Art of Computer Programming Addison-Wesley. Volume 1: Fundamental Algorithms (third edition, 1997). Good for mathematics relevant to analysis of algorithms. Volume 2: Seminumerical Algorithms (third edition, 1997). Good for random number generators, integer and polynomial arithmetic, etc. Volume 3: Sorting and Searching (second edition, 1998). Probably contains more than you want to know about sorting and searching algorithms. Volume 4A: Combinatorial Algorithms, Part 1 (first edition, 2011). We are still waiting for Volumes 4B–7. 1 In all 3 2 volumes the exercises (and solutions) are a mine of information. Try to find the latest editions as they are significantly different from the earlier editions.

95 References continued

A. Menezes, P. van Oorschot and S. Vanstone, Handbook of Applied Cryptography, CRC Press, 2001. A great reference. See http://www.cacr.math.uwaterloo.ca/hac/. Rajeev Motwani and Prabhakar Raghavan, Randomized Algorithms, Cambridge University Press, 1995. A good introduction. Hans Riesel, Prime Numbers and Computer Methods for Factorization, second edition, Birkh¨auser, Boston, 1994. An introduction to algorithms for primality testing, integer factorisation, and some of their applications. B. Schneier, Applied Cryptography, 2nd edition, Wiley, 1996. Describes many cryptographic algorithms. B. L. van der Waerden, Algebra (Volumes 1-2), Springer, 2003. A classic. The first edition (1931) had the title Modern Algebra.

96