<<

Volume pp

A Survey of Mo dern

Peter L Montgomery

Las Colindas Road

San Rafael CA USA

email pmontgomcwinl

Every p ositive integer is expressible as a pro duct of prime numb ers in a

unique way Although it is easy to prove that this factorization exists it is

b elieved very hard to factoranarbitrary integer We survey the b est known

algorithms for this problem and give some found at CWI

Introduction

An integer n is said to b e a or simply prime if the only

of n are and n There are innitely many prime numb ers the

rst four b eing and If n andn is not prime then n is said to b e

comp osite The integer is neither prime nor comp osite

tal Theorem of Arithmetic states that every p ositiveinteger The Fundamen

can b e expressed as a nite p erhaps empty pro duct of prime numb ers and

that this factorization is unique except for the ordering of the factors Table

has some sample factorizations

Table Sample factorizations

The existence of this factorization is an easy consequence of the denition

of prime number and the wellordering principle The uniqueness pro of is

only slightly harder However this existence pro of gives no clue ab out how

to eciently nd the factors of a large given integer No p olynomialtime

for solving this problem is known

Factoring large has fascinated mathematicians for centuries Gauss

wrote

The problem of distinguishing prime numb ers from comp osites

and of resolving comp osite numb ers into their prime factors is one

of the most imp ortant and useful in all of arithmetic The dignity

of science seems to demand that every aid to the solution of such

an elegant and celebrated problem b e zealously cultivated

Some b o oks are devoted to tabulating the factors of numb ers of sp ecial form

The most referenced such table is the Cunningham table which lists known

n

factors of b for bases b and small n Brent et al extend

Some of these factorizations app ear in which these tables through b

n n

also has some factors of a b Brillhart et al give factors of the

Fib onacci numb ers and related Lucas numbers The RSA Challenge is

building a table of factorizations of partition numb ers

Factorization was once primarily of academic interest It gained in prac

tical imp ortance after the intro duction of the RSA publickey cryptosystem

see x The cryptographic strength of RSA dep ends on the dicultyof

factoring large numb ers

Let N b e a large comp osite integer Until the s the b est algorithms for

factoring N to ok time O N for some One such algorithm is trial divi

p

N This changed when Mor sion which tries to divide N by all primes up to

rison and Brillhart intro duced the continued fraction metho d whose

time is

n o

p

p

o log log N log N

exp o log N log log N O N

Mo dern algorithms for factoring N fall into two ma jor categories Algo

rithms in the rst category nd small prime factors quickly These include

Pollard Rho P and the metho d Algorithms in

the second category factor a numb er regardless of the sizes of its prime factors

but cost much more when applied to larger integers These algorithms include

continued fraction and numb er eld sieve

In practice algorithms in b oth categories are imp ortant Given a large in

teger with no clue ab out the sizes of its prime factors one typically tries algo

rithms in the rst category until the cofactor ie the quotient after dividing

byknown prime factors of the original numb er is suciently small Then one

tries an algorithm in the second category if the cofactor is not itself prime If

An algorithm is said to b e p olynomialtime if its worst case execution time is b ounded

by a p olynomial function of the length of the input If one wants to factor N whose length is

 

O log N then an O log N algorithm would b e p olynomialtime whereas an O N

algorithm is not

one is unable to nd a suciently small cofactor or a prime cofactor using

metho ds in the rst category the factorization remains incomplete

The interpretation of suciently small has changed considerably as tech

nology has progressed In the s John Brillhart and John Selfridge

p predicted that factoring numbers over digits would be hard In

the s Richard Guy p predicted that few numb ers over digits

would b e factored In the cuto is around digits

We illustrate some algorithms using N This number

was selected using CWIs street address

Williams and Shallit give a computational history of factoring and

primality testing from to ie b efore the era of electronic computers

Richard Guy gives a go o d survey of factorization metho ds known in

Brillhart et al giveachronology of developments in factorization b oth

by the Cunningham pro ject hardware and software esp the metho ds used

Robert Silverman gives a more recent exp osition Several textb o oks

cover factorization

We review some elementary We review some fundamental

algorithms which will b e needed later

Notations

The symbols Q R and Z denote the sets of rational numbers real numb ers

and integers resp ectively

If x and y are integers then x j y read x divides y means that y is a

if and only if there exists k Z such that y kx multiple of x That is x j y

The greatest common GCD of two integers x and y is denoted

gcdx y The GCD is always p ositive unless x y If gcdx y then

x and y are said to b e coprime they have no common divisors except

Review of Elementary Number Theory

This section reviews some elementary and analytic numb er theory Pro ofs can

b e found in manynumb er theory textb o oks

Congruence classes and

Fix n Twointegers x and y are said to b e congruentmodulon if x y

is divisible by n This is written

x y mo d n

For xed n the relation is an equivalence relation reexive symmetric

transitive The equivalence classes are called congruence classes A with

exactly one representative from each congruence class is called a complete

residue system There are exactly n congruence classes The canonical

complete residue system is f n g

We omit the mo dulus n writing simply x y when the mo dulus is clear

from the context

The relation is preserved under addition subtraction and multiplication

If x x y y and n are integers such that

x y mo d n and x y mo d n

then

x x y y mo d n

x x y y mo d n

x x y y mo d n

These are easily proved using the denition of For example x x y y

ultiples of n x y x y x y isasumoftwom

Equation says that it is meaningful to add subtract or multiply two

congruence classes since the equivalence class of the result do es not dep end

up on the selections of the representatives The congruence classes mo dulo n

form a commutative ring under these op erations This ring denoted by ZnZ

is called integers mo dulo n When n is prime division by nonzero elements

is p ossible and this ring is a eld often written GF n

A corollary to is that if f is a p olynomial in k variables with integer

co ecients and if x y mo d nfor i k then

i i

f x x x f y y y mo d n

k k

Occasionally we write r r where r and r are rational numb ers rather

than integers The notation a b a b mo d n means that the numera

tor of a b a b is divisible by n and that its denominator is coprime to n

That is a b a b mo d n and gcdb b n

Properties of prime numbers

Let p b e a prime number The following prop erties are stated without pro of

and p j xy then p j x or p j y Equivalently if xy If x y Z

mo d p then x mo d pory mo d p

Unique factorization As previously mentioned every p ositiveinteger can

b e written as a p erhaps empty pro duct of primes and this representa

tion is unique except for ordering See Table for some examples

p p p

The p olynomials X Y and X Y are congruentmodulop For

example every term in

X Y X Y X Y X Y X Y XY

is divisible by This can b e proved using the binomial theorem

p

Fermats little theorem If a Zthen a a mo d p In particular

p

if p do es not divide athen a mo d p One pro of uses the last

prop erty and induction on a

Chinese remainder theorem

Let n and n b e coprime p ositiveintegers If r and r are arbitrary integers

then the two congruences

x r mo d n and x r mo d n

may b e replaced by a single congruence

x r mo d n n

where r is chosen to satisfy r r mo d n andr r mo d n This is

known as the Chinese Remainder Theorem

For example the two congruences

x mo d and x mo d

are equivalent to the single congruence x mo d To conrm this

note rst that such an equation exists since

and are coprime cf If holds then there exist integers k and

suchthat x k k Hence k

x x x k k

k k k k

whichshows that x mo d Converselyif x mo d say

x k then x k k implying

A generalization allows more than two mo duli If gcdn n for

i j

i j k ie if the integers n n are pairwise coprime and if r Z

k i

i then the system of congruences for all

x r mo d n i k

i i

is equivalent to a single congruence

x r mo d n n n

k

when r is suitably chosen There are ecientways to nd this r given the fn g

i

and fr g

i

When we know an upp er b ound on an integer x then we can determine

x uniquely if we knowit mo dulo enough primes More precisely if we know

that jxjB and if wecho ose n n n B then determines x

k

uniquely

Smooth numbers

An integer n is said to be smo oth with resp ect to a bound B if no prime

factor of n exceeds B In Table the numb ers and

are smo oth with resp ect to

For xed B and x B the number of p ositive integers less than x and

u

smo oth with resp ect to B is approximately xu where u ln x ln B

p

Density of prime numbers

The Prime Number Theorem states the asymptotic density of primes Let

x denote the numb er of primes not exceeding the real number x Then

xlnx

lim

x

x

In other words x is approximately x lnx for large x

Anumerical example is x Then whereas ln

A more accurate approximation is xln x This predicts primes

below

Riesel Chapter gives some history ab out accurate computation of x

for large x

Some Essential Algorithms

We will assume that the reader is familiar with the following imp ortant algo

rithms

Multipleprecision arithmetic x Twointegers of magnitude at

most N can be added subtracted or compared in time O log N by

op erating on one digit at a time The classical multiplication and divi

sion algorithms taketime O log N A corollary is that addition and

log N if the op erands and subtraction mo dulo N can b e done in time O

result are required to b e in the interval N Mo dular multiplication

can b e done in time O log N

Mo dular exp onentiation pp If e is a nonnegativeinteger

e

and a N then the remainder a mo d N can be computed with

O log emultiplications mo dulo N and hence in time O log N log e

using the binary metho d of exp onentiation

Greatest common divisor pp If n and n are p ositivein

tegers then their can b e found in O log minn

n op erations on integers at most maxn n The extended GCD

algorithm also nds twointegers m m such that

gcdn n m n m n jm jjn j jm j jn j

RSA publickey cryptosystem

The RSA cryptosystem named after its inventors Rivest Shamir and Adle

man is the rst publickey system intro duced and remains the most used

publickey cryptosystem to day The strength of this cryptosystem dep ends

up on the diculty of factoring large integers

A publickey cryptosystem requires each user to have his own encryption

eryones E is known to pro cedure E and private decryption pro cedure D Ev

all other users like a city telephone directory Only the user knows his D

These pro cedures must b e bijections and inverses of each other D E M

E D M M for all messages M Both D and E must b e cheap to execute

but it must b e computationally infeasible to nd D given only E

RSA achieves these ob jectives by letting eachuserpicktwo large primes p

and q with p q Let N pq Cho ose two exp onents e d with de

mo d p q If M is an integer then Fermats little theorem implies

de de de

M mo d p and M M mo d q hence M M mo d N M

for all M Let the message space be the interval N a long message

can b e split into chunks Dene the encryption and decryption pro cedures by

e d

E M M mo d N and D M M mo d N The values of e and N

are public but d p q are private

These satisfy all of the publickey requirements except p ossibly the require

ment thatitbehardtond D given E If factoring is easythenanintruder

can nd p and q given N pq from this the intruder can nd p q

and hence d sinceheknows e That is it is easy to nd D given E if factoring

is easy The converse is unknown nobody has proven that factoring is easy

if one can easily nd D given E However the problems are believed to be

equivalent

A rep ort p recommends that any RSA publickey mo dulus ie

the pro duct N pq be at least bits ab out decimal digits if the

data must remain secure for years but do es not makea recommendation

for longer p erio ds

Algorithms for finding small prime factors of large numbers

T rial division

p

If N is comp osite then at least one prime divisor of N is at most N To

factor N the trial successively divides N by primes

p

upto b N c

If p is the second largest prime factor of an integer N then trial division

takes O p steps or O p ln p steps if one do es trial division only byprimes

seex

Almost all factoring programs attempt trial division by the smallest primes

Even if N is decimal digits long it takes only a few seconds to divide N

by all primes up to

Sometimes the prime divisors of N are known to have a sp ecial form For

n k

example if p j a but p a for any k where k n and k j n then

p mo d n This information facilitates trial division since it restricts

the range of p ossible divisors For suchnumb ers one might try trial division

by all qualifying primes b elow or even higher Unless N has a sp ecial form

trial division is impractical for nding prime divisors ab ove

Pol lardRho

In and John Pollard announced two new algorithms for nding

small factors of large integers Each algorithm do es a sequence of p olynomial

op erations additions subtractions multiplications in suchawaythatinter

mediate results are highly comp osite but nonzero Supp ose N is a number to

b e factored and p j N If r is one of the intermediate results and p j r then

p j gcdrN we hop e that this GCD do es not equal N itself

The Pollard Rho algorithm iterates a function Let f ZX be a

univariate p olynomial with integer co ecients Select x arbitrarily and dene

x f x n

n n

y If p is prime then the sequence fx mo d pg must eventually rep eat sa

n n

x x mo d p where n n Since f is a p olynomial function

n n



this and imply x x mo d p for all k That is the

n k n k



sequence fx mo d pg is eventually p erio dic with p erio d dividing n n

n n

If n n and n n j nthen x x mo d p This is the idea b ehind

n n

x N for n until it is one variation of Pollard Rho test gcdx

n n

nontrivial success or until n is to o large failure

mo d

Figure Pollard Rho cycle mo dulo using f X X and x

Figure shows the pattern mo dulo The rst rep etition is x x

mo d Since f is a p olynomial with integer co ecients the sequence has

p erio d except for its rst few terms x x mo d whenever n

n n

In particular x x mo d

When attempting to factor N this way the actual computations

are mo dulo N but we visualize them as being done separately

mo dulo the unknown prime factors of N This abstraction is justied bythe

Chinese Remainder Theorem Figure outlines the sequence of computations

all congruences are mo dulo

n x x x gcdx x N

n n n n n

x x x

x x x

x x x

x x x

x x x

Figure Factorization of via Pollard Rho

The cycle length is considerably longer mo dulo for which the rst

duplicate is x x mo d The factor would be found

while testing gcdx x N but is found rst while testing gcdx

x N

Complexity of Pol lardRho

By the p olynomial op erations required byPollard Rho can all b e done

mo dulo N after which one checks gcdr mo d N N rather than gcdrN

This pro cedure ensures that the intermediate results do not grow to o big and

is reected in the data shown in Figure

p

p iterations If p is a prime dividing N thenPollard Rho app ears to take O

to nd p Given x and x the nth iteration computes

n n

x f x mo d N and x f f x mo d N

n n n n

and tests gcdx x N If f X X then this is three mo du

n n

lar multiplications four mo dular additions subtractions are countedasaddi

p

tions and one GCD op eration per iteration If p O N then the time

p

is O plog N O N log N bit op erations This is asymptotically

b etter than the O pO N divisions needed by trial division

It is p ossible to trade most of the GCDs with N for multiplications mo dulo N

As stated the algorithm tests each gcdx x N Weanticipate that most

n n

GCDs will be trivial otherwise weve found a factor We can replace two

GCD tests

gcdrN and gcds N

by a single test

gcdrs mo d N N

The GCD in will b e nontrivial if and only if at least one GCD in

r and s must b e coprime to N is nontrivial If rs is coprime to N then b oth

we trade the two GCDs in for one mo dular multiplication and the one

GCD in Converselyif rs shares a factor with N then at least one GCD

in must be nontrivial the latter event is suciently rare that we can

aord to test b oth GCDs in case they yield separate factors of N In practice

one takes the pro duct of several x x b efore doing a GCD with N

n n

Brent prop oses testing gcdx x N whenever n is a power of

m n

and n m n instead of testing values of gcdx x N Brents

n n

variation would nd the prime while testing gcdx x N rather than

gcdx x N and would nd the prime while testing gcdx x N

rather than gcdx x N Brents variation takes ab out twice as many

iterations but it is ab out faster o verall b ecause each iteration applies the

p olynomial f once rather than three times Brent and Pollard used

this to nd the digit prime factor of the digit Fermat

number

P

So on after discovering Pollard Rho John Pollard found another metho d which

nds small prime factors His new metho d called the P metho d nds

a prime factor p of N if p is smo oth In the worst case when p is

prime the P metho d can require O p arithmetic op erations to nd pas

in trial division But some prime divisors are found very quickly

The P metho d is based on Fermats little theorem x Supp ose M

M

is such that p j M If gcda N then a mo d p implying

M

p j gcda N For example if M then this will nd

is among p if p

Eleven of the primes below are in this list If we replace by

say then we will also nd and many larger

primes

The value of M is typically the pro duct of small primes or prime p owers If

M is the pro duct of all prime p owers b elowabound B then M expB so

the binary metho d of exp onentiation needs O log M O B multiplications

M

mo dulo N to compute a mo d N After one GCD op eration we hop e to

disco ver p

For example supp ose N and B Figure summarizes the

computations if we b egin with x a and let x denote our intermediate

q

result after pro cessing a p ower of the prime q In this example wecho ose to

checkeach gcdx N rather than only the nal gcdx N We can

q

stop early when we nd the factor of x

x

x x gcdx

x x gcdx

x x gcdx

x x gcdx

x x gcdx

x x gcdx

Figure Factoring via P withx and B

The factor is found since is a pro duct of prime

powers dividing On the other hand is

not of this form Observe that the P metho d nds rst whereas trial

division and Pollard Rho nd rst The smallest prime factor of a number

is not always the easiest factor to nd

Step

A mo dication to the P metho d called Step allows p to have

one prime divisor exceeding B if that prime divisor is not to o big

Sp ecically supp ose p is a prime divisor of N and

p m q where m j M

M

If gcda N and A a mo d N then Fermats theorem implies

q

Mm

Mm

Mm Mq mq p q M

p mo d a a a A a

q

That is the prime p will divide gcdA N

If q is not to o large then we can nd p using the output A from Step The

n n



N where n n this will reveal p A idea is to test several gcdA

 

if q j n n If wewant to test all primes qB for some B thenwe can

p

n



use two tables of size O B one containing all values of A mo d N and

n



another all values of A mo d N Eachentry in one table is compared to each

entry in the other Variations work for the P and elliptic curve metho ds

whicharecovered in the next two sections

Example factors found by P at CWI with B million are the factor

p of and the factor p of where

p

p

p

p

These factors app ear in the up date to

P

The P metho d nds a factor p of N if p is suciently smo oth The

metho d has found many factors but fails miserably if p hasavery large

factor suchasif p q for some prime q

In Hugh Williams published a metho d whichworks when p

rather than p is smo oth Williamss metho d called the P metho d

op erates in the nite eld GF p having p elemen ts and characteristic p

Let P be an integer and assume that P isis a quadratic nonresidue

ie not a mo dulo p Denote f X X PX This quadratic

has two ro ots and over GF p which satisfy P Because

p p

P P mo d ponerootoff is

p p p p p

f P P f in GFp

p p p

Since f has only two ro ots either or If then

GFp and

P

This is imp ossible since GF p and P is assumed to be a

p p

quadratic nonresidue mo dulo p Therefore implying



The P metho d selects a in the multiplicative GFp of

p and nds p if this order is smo oth The P metho d is structurally

similar to P but takes p owers of GF p and succeeds if the order of

is smo oth One wayto do the arithmetic observes that and isa basis

for GFp over GFp and uses arithmetic mo dulo N in place of arithmetic

mo dulo the unknown prime p This can b e improved considerably by using

e e e

Lucas functions to manipulate values of rather than

There is no known waytocheck b eforehand whether P is a nonresidue

without knowing p If one runs the metho d three times using three values for P

then there is an c hance that at least one of the values for P will b e

a quadratic nonresidue When P is a then GFp

and the P metho d b ecomes an exp ensivevariantofthe P metho d

One factor found by P at CWI with B million is the factor p of

where

p

p

El liptic Curve Method

The P metho ds nd a factor p of N if either p is suciently smo oth

However they fail if both p and p have large prime factors This

happ ens frequently for example b oth p and p are primes for

p In Hendrik Lenstra Jr overcame

this diculty when he announced a similar metho d called the Elliptic Curve

Metho d abbreviated ECM

An elliptic curve over a eld K of characteristic not is the set of solutions

x y K K to a cubic equation

Y X AX BX C

together with a sp ecial p oint conceptually called the point at in

nity There is one restriction to the co ecients in the discriminantof

the cubic p olynomial must b e nonzero ie

A C A B AB C B C

The p oints on an elliptic curve form an ab elian group E K when the group

op erations are suitably dened as illustrated in Figure The negation of

the p oint at innity is itself the negation of any other p oint P x y is

dened to b e P x y For addition supp ose that P and P are two

points on the elliptic curve If either P or P is the point at innity then

dene P P to be the other p oint Otherwise supp ose P x y and

P x y If x x then dene P P P where P is the p oint

where the straight line through P and P reintersects A calculation

gives

y y

m

x x

P x y where x m A x x

y y mx x

When instead x x then y y by If y y then dene

y P P to be the point at innity If y y so that y

then dene P P P with P dened as in except that we use

m x Ax B y slop e of tangentlineatP P

It is amazing that this denes an asso ciative op eration All group op er

ations are dened in terms of ordinary addition subtraction multiplication

division and comparison no square ro ots and are meaningful over arbitrary

elds where In particular they are meaningful if K GFp where p

is an o dd prime The resulting elliptic curve group E GFp is nite Hasse

p

p By changing p showed that its order is p where j j

the constants in we get another curve whose order is usually dierent

If N is comp osite say N pq where p and q are distinct o dd primes then

the ring ZN Z is not a eld but we can use to dene a curve and attempt

to use the algebraic rules to do group op erations mo dulo N Wewillfailie

b e unable to execute the algebraic op erations only if we attempt to divide by

a nonzero noninvertible numb er mo dulo N Such a denominator called a

zero divisor will b e divisible by p or q but not b oth and will give us a factor

of N

For example supp ose N Wemightcho ose

E Y X X and P

It turns out that jE GF j and jE GF j

If we attempt to compute P or anymultiple thereof then we

will strike the identity element of the group mo dulo but probably not

mo dulo This will causeusto attempt to divide bya nonzero multiple

of and the factor will b e found Indeed if wework mo dulo

then

P

P

P

P

P

P

P

P P P

P

P

P

Next we attempt to compute

r

P

Q

r

P

r r r

P Q P Q

A

A

A

A

A

A

r r r

P Q P Q P

A

A

A

r

A

Q

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A r

P

P

P Q

P Q

P P Q

Figure Group lawon y x x

P P P

The xco ordinates and are distinct but their dierence

is not invertible mo dulo N A GCD nds the factor

One early numb er done by ECM is the rd Fib onacci number F Its

algebraic cofactor has ve prime factors namely

F

p p p p p

F

where

p p

p

p p

p

p p

p

p p

p

and p is a digit prime The P metho d with B and a Step

b ound of missed all four small factors although the metho d nds many

others with these sizes A twostep P algorithm found p however it

missed p b ecause the chosen value of P was a quadratic residue mo d

ulo p ECM found all four small prime factors easily

One ECM factor found by CWI is the digit factor

of The job was run on the CRAY C at SARA using curves

The nd curve had very smo oth order

The largest factor found by ECM is the digit factor

of the partition number p It was found byFranzDieter Berger Uni

versity of Saarland Germany in

Algorithms for factoring arbitrary integers

The next several algorithms try to factor an odd integer N by nding two

squares X and Y suchthat

X Y mo d N and gcdXY N

Then they test gcdX Y N hoping for a nontrivial factor of N Whereas

the algorithms in x require time dep ending primarily on p to nd the smallest

prime factor p of N the times for the up coming algorithms dep end primarily

on the size of N itself

If N has two distinct o dd prime factors p and p and if X and Y are randomly

selected sub ject to then gcdX Y N will b e nontrivial ie neither

nor N exactly of the time Indeed cho ose Z such that Z mo d p

and Z mo d p this Z exists by the Chinese remainder theorem

Then given X the solutions Y of X Y mo d N are Y X Y X

Y XZ The corresp onding values of gcdX Y N are Y XZ and

N p and p resp ectively Twoofthese four are nontrivial If N has k

distinct o dd prime factors then the probability of success with a single X Y

k

pair is

These metho ds dont work if N is a prime p ower ie if N do esnt havetwo

k

distinct prime factors but this condition is easily checked If N p where

k

p is prime and k then N p is divisible by p By Fermats

little theorem if gcda N then

N p

N p N p

mo d p a a

N

Consequently gcda N will b e divisible by p If instead wendana

N

such that gcda a N then n cannot b e a prime p ower

Finding squares through products

The b est metho ds for constructing congruences of the form start by ac

cumulating several congruences

A B mo d N

i i

where each A and each B is either a square or a square times a smo oth

i i

number These congruences are also called relations For N a

sample relation is

mo d N

in which all prime divisors of and of are

under

The algorithms vary in how they nd the relations Once suciently

many relations are found each algorithm attempts to nd a nonemptysetS

of indices such that b oth

Y Y

B A and

i i

iS iS

are squares The pro duct of the corresp onding congruences is a congruence of

the form

We illustrate with a small example using N The left side of

Figure lists some congruences mo dulo in which the only prime factors

of each side are We delib erately suppress the congruences

and where b oth sides are squares since either of these factors

immediately

p

p

p

p

p

p

p

Figure Some congruences mo d involving p owersof

We now multiply some of these congruences so as to generate squares on

b oth sides For example b ecomes Rewrite

this as This congruence factors since gcd

gcd

Wemight instead havechosen to multiply by This gives

which is the same as Since gcd

gcd this congruence do es not yield a factorization

The decision to multiply by or by is

made by lo oking at the exp onents of the primes in the resulting pro duct All

exp onents in and in are even There are eight

congruences in Figure let the exp onents e to e b e one or zero dep ending

up on whether the corresp onding congruence is included in or excluded from

the pro duct The pro duct congruence

e e e e e e e e

      

e e e e e e e e

      

factors into

e e e e e e e e e e e e e e e

            

e e e e e e e e e e e e e

          

Both sides will b e squares precisely when all exp onents are even This is equiv

alent to requiring that all elements of the matrixvector pro duct

e

e

e

e

e

e

e

e

be even

Equation has the form Be mo d where e is the exp onent

vector and B is the exp onent matrix hidden in the right of Figure

but reduced mo dulo The eight column vectors must b e linearly dep endent

since all are in a space of dimension at most This is equivalent to saying

that there exists a nonzero e GF such that Be

In this example the fth sixth and seventh columns of B are all

T

Anytwo of these sum to zero mo dulo This corresp onds to multiply

ing two of the three congruences and as we did

earlier

T

Another vector in the null space of B is The congru

ence

mo d

b ecomes mo d which again gives the factorization

Traditionally one solved the system Be by a variation of Gaussian

elimination Recently some iterative metho ds have b een found The

iterative metho ds are sup erior when the matrix is large since they require less

teger factorization problems are very sparse storage matrices arising from in

For these large sparse matrices the iterative metho ds are also faster if B is

an n n matrix then Gaussian elimination uses O n bit op erations but the

iterative metho ds take O n applications of the matrix Bwhich is time O n

if the numb er of nonzero entries p er column remains b ounded as n grows

Factor base

The set of primes app earing in the factorizations in Figure is called the

factor base Often it is convenient to also include in the factor base If we

allow primes b elow B to app ear then the size of the factor base is ab out B

see x for estimates of B

Freerelations

In the last example while factoring with a factor base of f gwe

could have used the four trivial congruences

in the pro duct For example although and are not

squares the congruence

mo d

yields mo d and a factorization The congruences in are

called free relations b ecause the eort required to nd the relations do es not

dep end on the size of N

In this example which uses the factor base f g on b oth sides we

could disp ense with the free relations and use the factorizations including neg

ative exp onents of the quotients directly Each rational

quotient is congruent to and the resulting matrix will b e smaller since each

prime app ears only once not once p er side The Numb er Field Sievex

uses free relations which are more complicated than those shown here and in

so this simplication do es not work which the two factor bases are dierent

there

Continuedfraction method

The continued fraction metho d abbreviated CFRAC is no longer in contention

as a mo dern factoring metho d but we include it b ecause it is similar to some

mo dern metho ds and easier to understand It was used to factor the seventh

Fermat numb er digits in p

CFRAC lo oks for congruences X r mo d N with small r sp ecically

p

N For each congruence it nds it attempts to factor r using the r O

factor base Where r is smo oth the congruence is saved so it can b e multiplied

by other such congruences to form squares on b oth sides

p

If N is a p erfect square then it is easy to factor N Otherwise N is

p

irrational There exist innitely many rational approximations PQ of N

suchthat

p

P

N

Q Q

p

N Q If PQ is anysuch approximation and wecho ose suchthat PQ

then

p p

P NQ Q N Q N Q NQ

p

Since jj this shows that jP NQ j N Q Hence all suchvalues

p

of P NQ are O N as desired We knowa square ro ot of P NQ

mo dulo N namely P

As an example with N the rst continued fraction approxi

p

mations to N app ear in Table Three values of P NQ are

Convergent PQ P N Q

p

Table Approximations to

N

N

N

The pro duct of the three right sides in is a square namely

Multiply the three left sides and suppress the multiples of N to reveal

mo d N

Reduce each parenthesized argumentmoduloN to get mo d N

Unfortunately this trivial congruence do es not yield a factorization Wecould

try more but will instead illustrate other algorithms

Table giv es values of P and Q to full precision The recurrences for P

and Q allow one to work with P mo d N and Q mo d N instead of P and Q

themselves and can be evaluated quickly For example the numerator and

denominator of the last entryinTable are the sums of those parts of the

two previous entries

Some small primes eg are missing in Table This is b ecause

they are not quadratic residues mo dulo N If pjP NQ where gcdP Q

then N must b e a quadratic residue mo dulo p unless p j N Unless N is a

p erfect square only half of the primes asymptotically have N as a quadratic

residue If B is the upp er limit on the factor base then the factor base size is

rather than B ab out B

Sieving

Much of the time in CFRAC is sp ent factoring the residues P NQ to

test whether they are smo oth This work is done primarily by trial division

although one may employ the other metho ds in this survey to o

Quadratic Sieve see x eliminates this burden If f ZX isa univariate

p olynomial with integer co ecients and p is a prime then the values of x for

which p j f x lie in a few arithmetic progressions By if k is an integer

f x kp f x mo d p Therefore f x kp will b e divisible by p if and

only if f x is divisible by p

Supp ose we want to evaluate a p olynomial f at several consecutive values

of x and check eachvalue for smo othness Start by building a table of values

of f x For each prime p in our factor base nd the ro ots of f mo dulo pby

factoring f X over GF p x Then for each x such that f x

mo d p replace our tabulated value of f x by f xp After pro cessing all

primes in our factor base if any table entry is then the corresp onding f x

was smo oth

This pro cedure can b e improved considerably One improvement tabulates

log jf xj rather than f x and subtracts log p rather than dividing by p The

logarithms can b e approximate p erhaps to base At the end lo ok for small

values in the table not just for a value of log This pro cedure will also

nd some values of x for which f x is smo oth but not squarefree ie for

which a prime p ower divides f x

Quadratic sieve

Using the ideas in the last section Quadratic Sieve lo oks at the values

of a quadratic p olynomial at successivepoints We illustrate it by a detailed

example Dene f X X N where N After sieving f x for

j k

p

values of x near N we accumulate data similar to that in Figure

The third sixth and seventh columns in the lower table of Figure sum to

zero mo dulo Hence the pro duct

f f f

gives a square on the right Take square ro ots and recall the denition of f

to derive

mo d N

A calculation gives mo d N and

mo d N Unfortunately the congruence

mo d N do es not help

f

f

f

f

f

f

f

p

p

p

p

p

p

p

p

Figure Smo oth values of f X X and asso ciated binary

matrix

Multiple Polynomials

p

M then the largest residue is N x If wesieve the M values of f x for

p p

ab out M N assuming M N Montgomery found a waytostunt

this growth as M grows His variation is called the Multiple Polynomial

Quadratic SieveorMPQS

Let k ifN mo d and k ifN mo d Find a quadratic

p olynomial g X a X bX c such that b a c kN For example

when N and k we might pick

g X X X

We discuss howtocho ose g b elo w Once g is selected we sievetondvalues

of x for which g x is smo oth In this case b oth

g and g

are smo oth Because

b b a c b

aX mo d N aX g X

a a a

the square ro ots of and mo dulo are

and resp ectively These can b e merged with other data

in Figure to pro duce squares on b oth sides One such pro duct

f f g mo d N

yields mo d N which factors N

The p olynomial g X in can b e found by rst selecting an o dd prime

value for a here We require that kN be a quadratic residue mo dulo a

Solve b kN mo d a for b Then solveb a kN mo d a for

a whichever has the same parityaskN Set b b a or b b a

Dene c b kN a By construction c is an integer and b a c kN

When sieving over jxjM the values of a b c should b e picked so that

the values jg M j jg j and jg M j are approximately equal That is

the parab ola should cross the xaxis twice in the interval M M and the

three extrema should have comparable magnitudes The solution sub ject to

b a c kN is

p

p

kN

a b c M kN

M

p

kN which is at most The largest p olynomial value is ab out jcj or M

p

M dierent p olynomials N TosieveM values of x one can use MM

p

N rather sieving M values p er p olynomial The largest residual is O M

p

than O M N Details are in

Since values from dierent p olynomials can b e combined the sieving p ortion

of the MPQS algorithm is easily parallelized Each pro cessor sieves dierent

p olynomials and all smo oth residues go to acentral site This was used for

the RSA factorization mentioned in x

As in CFRAC the only primes in the factor base are those for which kN is

a quadratic residue

Large Prime Variations

The sieving pro cedure in x lo oks for values of x suchthatf x is smo oth with

resp ect to the factor base The algorithm is easily mo died to also nd values

of x for which f x is a smo oth numb er times a prime not much larger than the

factor base b ound by adjusting the threshold used when insp ecting logarithms

after sieving The extra prime in the factorization of f x is called a large

h f x has the same large prime prime If one nds twovalues of x for whic

then the corresp onding congruences can b e multiplied together and treated as

a pair for the rest of the algorithm

This pro cedure called the large prime variation is compatible with the

use of multiple p olynomials describ ed in the last section For example both

f and g have as the only prime exceeding

After doing the linear algebra phase we decide to combine these with three

entries in Figure to get the pro duct

g f f f f

This gives the congruence

which simplies to and factors N

Another variation of MPQS uses two large primes instead of one this version

is known as PPMPQSSee

Number Field Sieve

The Numb er Field Sieve NFS uses ideas from

It made newspap er headlines in when it was used to factor the digit

cofactor of the ninth

Supp ose N is a comp osite integer to b e factored NFS has four main phases

Polynomial selection Select two irreducible univariate p olynomials

ts for which there exists an f X and g X with small integer co ecien

integer m such that

f m g m mo d N

The p olynomials f and g should not have a common factor over Q Often

one p olynomial is X m and the other has the co ecients of the basem

expansion of N for suitable m

There is no known go o d way to pick these p olynomials unless our

original numb er has a sp ecial algebraic form such as the

shown in x For the ninth Fermat numb er the p olynomials were chosen

to b e X and X with common ro ot m

Let denote a complex ro ot of f and denote a ro ot of g

Sieving This phase nds pairs a bsuch that gcda b and such

that b oth

degf degg

b f ab and b g ab

are smo oth with resp ect to a chosen factor base

The sieving phase can x b and searchforvalues of a such that b oth p oly

nomial functions in are smo oth using the ideas in x Although

we require twovalues b e smo oth rather than one value as in MPQS

the values in are suciently smaller that wegainoverall

Linear algebra The expressions in are the norms of the algebraic

numbers a b and a b multiplied by the leading co ecients of f and

of g respectively The principal ideals a banda b factor into

pro ducts of prime ideals in the numb er elds Q andQ resp ectively

All prime ideals app earing in these factorizations have small norm since

the norms are assumed to b e smo oth so only a few dierent prime ideals

can app ear in these factorizations Use linear algebra to nd a set S of

indices such that the two pro ducts

Y Y

a b and a b

i i i i

iS iS

are b oth squares of pro ducts of prime ideals

Square ro ot Using the set S in try to nd algebraic numb ers

 

Q and Q suchthat

Y Y

 

a b and a b

i i i i

iS iS

Couveigness algorithm works if the p olynomials f and g have odd

degrees Montgomerys square ro ot algorithm allows arbitrary degree

Let Q ZN Z and Q ZN Z be homomorphisms

induced by setting m where m is the common ro ot of

f and g The congruence

Y

 

a b

i i

iS

Y



a b m mo d N

i i

iS

has the form the two sides will be coprime to N if none of the

factorizations in share a factor with N

Example of NFS

The rst step in NFS is p olynomial selection If we somehow observethat

N

then wecancho ose

f X X and g X X

Both p olynomials vanish mo dulo N when X mo d

After sieving and linear algebra we construct the pro duct

X X hX X X

X X X

We claim that gives squares on b oth sides More precisely with

p



and

h

and



h where





The factor divides the numerator of

When selecting we included a factor of onboth sides The con

gruence is a free relation much as in x There is also one free

relation for eachprimep such that the p olynomials f X and g X b oth split

completely mo dulo p but no such relations were used in

Improvements in technology

RSA factorization

In a MIT technical memo which Martin Gardner summarizes in his Math

ematical Games column Rivest et al challenge the public to factor a

digit which they claim is the pro duct of digit and digit factors Rivest

estimates that the required running time using the b est algorithms and ma

chines available in would b e quadrillion years It to ok muchlesstime

than predicted After an month worldwide eort organized by Derek

aul Leyland the factorization was Atkins Michael Gra and P

completed by PPMPQS in April This eort to ok an estimated

MIPS years It found

RSA

p p where

p

p

Factorizations found at CWI

In June researchers at CWI and in Oregon USA achieved some record

factorizations using the numb er eld sieve

The rst was the digit Cunningham number N

No factors were known At Oregon State University OSU in USA Peter

Montgomery et al had sieved this numb er using NFS with the two p olynomials

X and X

They used ab out workstations at OSU over an week p erio d during spring

and summer The researchers gathered million relations but were

unable to pro cess the data and nd the factorization During while

Montgomery was at CWI the Computational Numb er Theory group at CWI

completed the linear algebra and square ro ot phases of the work They found

the factorization N p p where

N

p

p

The sp ecial algebraic form of N simplied the p olynomial selection phase

This beat the digit record which AK Lenstra and Dan Bernstein had

previously achieved using NFS

The OSU team also sieved the following digit cofactor of

N

Using the data gathered at OSU the CWI group found N p p where

p

p

This time the p olynomial selection phase was more complicated The re

searchers used two quadratic p olynomials

f X X X

g X X X

The resultant of these p olynomials is N so they share a common ro ot

mo dulo N

The N was the rst large numb er completed by NFS which did not have

a sp ecial form The record was broken a month later when three researchers

completed a digit cofactor of the partition number p using a fth

degree p olynomial and a linear p olynomial The p olynomial selection sieving

and linear algebra phases were done by Arjen Lenstra and Bruce Do dson in

the USA the square ro ot phase was done byPeter Montgomery at CWI

For N the factor bases had all primes b elow million on small worksta

tions or b elow million on SPARC s The program allowed two large

primes up to million on eachside The sparse matrix was

with an average of nonzero entries p er column

For N the factor bases contained all primes b elow million and large

primes went to million The sieving was p erformed in suchawaythatevery

relation contained at least one large prime b etween million and million

and could contain ve large primes The matrix was with

an average of nonzero entries p er column

These matrices are larger than any previous matrices arising from integer fac

torization problems The N matrix would require gigabytes of memory

to store in dense form which is more than most sites haveeven on secondary

storage This prevented the Oregon researchers from nishing the work The

CWI researchers used a novel Blo ck Lanczos algorithm for the linear algebra

the phase and completed the larger problem in hours on a Cray C at

Academic Computing Center Amsterdam SARA

Acknowledgements

This work was funded byCWICentrum vo or Wiskunde en Informatica Ams

terdam and by the Stieltjes Institute for Mathematics Leiden The manuscript

was revised while the author visited Bellcore USA Thanks to Richard Brent

Mary Flahive Marije Huizing Arjen Lenstra and Herman te Riele for review

ing early drafts of this manuscript

References

John Brillhart Peter L Montgomery and Rob ert D Silverman Ta

bles of Fib onacci and Lucas factorizations Mathematics of Computation

SS January

Don Copp ersmith Solving linear equations over GF Blo ck Lanczos

algorithm Linear Algebra and its Applications Octob er

Don Copp ersmith Solving homogeneous linear equations over GF via

blo ck Wiedemann algorithm Mathematics of Computation

January

JeanMarc Couveignes Computing a square ro ot for the numb er eld sieve

In AK Lenstra and HW Lenstra Jr editors The Development of the

Number Field Sieve volume of Lecture Notes in Mathematics pages

SpringerVerlag Berlin

Martin Gardner Mathematical games A new kind of cipher that would

takemillionsofyears to break Scientic American August

AK Lenstra and MS Manasse Factoring with two large primes Mathe

matics of Computation

Peter L Montgomery A blo ck Lanczos algorithm for nding dep endencies

over GF Technical rep ort CWI Amsterdam To app ear

Peter L Montgomery Square ro ots of pro ducts of algebraic numb ers In

Walter Gautschi editor Mathematics of Computation a Half

Century of Computational Mathematics Pro ceedings of Symp osia in Ap

plied Mathematics American Mathematical So ciety To app ear

Peter L MontgomeryVectorization of the elliptic curve metho d Technical

rep ort CWI Amsterdam To app ear

Michael A Morrison and John Brillhart A metho d of factoring and the

factorization of F Mathematics of Computation January

JM Pollard Theorems on factorization and primality testing Proc Camb

Phil Soc September

JM Pollard A Monte Carlo metho d for factorization BIT

Analysis and comparison of some integer factoring al

gorithms In HW Lenstra Jr and R Tijdeman editors Computational

er Theory Part I pages MathematischCentrum Methods in Numb

Amsterdam Math Centrum Tract

Carl Pomerance The quadratic sieve factoring algorithm In T Beth

N Cot and I Ingemarsson editors Advances in Cryptology Proceedings

of EUROCRYPT volume of Lecture Notes in

pages New York SpringerVerlag

RSA Challenge Administrator Information ab out RSA Factoring Chal

lenge March Send electronic mail to challengeinfocom

Joseph H Silverman The Arithmetic of El liptic Curves volume of

Graduate Texts in Mathematics SpringerVerlag New York

Massively distributed computing and factoring large Rob ert D Silverman

integers Novemb er

Douglas H Wiedemann Solving sparse linear equations over nite elds

IEEE Trans Inform Theory January

HC Williams A p metho d of factoring Mathematics of Computation

July

HC Williams and JO Shallit Factoring integers b efore computers In

Walter Gautschi editor Mathematics of Computation a Half

Century of Computational Mathematics Pro ceedings of Symp osia in Ap

plied Mathematics American Mathematical So ciety To app ear