<<

∗ Algebra and Number Theory Lecture Notes

Szymon Brzostowski

November 1, 2020

General remarks on method and criteria of assessment (2020-2021):

1. The discussion class mark is the average of the marks from two or more tests. The mark may be increased in special cases (to students taking active part in the classes) up to one level up.

2. There will be one regular retake (scheduled just before or during the final exam- ination period) for those students who fail any of the aforementioned tests.

3. The lecture mark is the final exam mark (checking students’ theory knowledge), scheduled during the end-of-term examinations.

4. The final mark is comprised of the discussion class mark (50%) and the lecture mark (50%), provided that both are passing marks.

5. „Second sit” test/examination will take place during the resit examination period.

Dates:

end-of-term examinations (final examination period): 8–21 February 2021,

resit (second sit) examination period: 1–7 March 2021 .

. This document has been written using the GNU TEXMACS text editor (see www.texmacs.org). 2 Algebra and Number Theory. Lecture Notes Szymon Brzostowski

This text is intended to be a more detailed and formal version of the lecture itself. In to get the most of it, I recommend reading it with your discussion-class notes at hand. Hopefully, this should allow for full comprehension of the material given.

Szymon Brzostowski

a

Notation. The symbol „ ” will denote an assignment, for example means

that the name a carries the value . More generally, „ ” can be treated as a defi- nition of a new concept (on its left side) using already known concepts (appearing on its right side). The phrase „if and only if” will be abbreviated as „iff”.

Table of contents 1 Numbers and operations ...... 3 2 Which numbers are „best”? ...... 5 3 The first abstraction – the concept of a ...... 7 4 ...Divisibility in Z ...... 9 5 ...Second abstraction – the concept of a ring ...... 11 6 The concept of an ...... 12 7 The ...... 15 8 The third abstraction – the concept of a field ...... 17 9 Congruences ...... 18 10 GCD ...... 20

11 Solving a · x ≡n b and a ·n x = b ...... 23

12 The group Gn and the ϕ function ...... 26 13 Fermat’s little theorem and its generalizations ...... 30 14 Chinese remainder theorem ...... 35 15 The prime numbers ...... 38 16 Systems of linear equations ...... 42 16.1 Fromonetoseveralequations ...... 42 16.2 Triangular systems and back substitution ...... 45 16.3 Matrixnotationforalinearsystem ...... 46 16.4 Rowechelonformofamatrix ...... 50 17 The algebra of matrices ...... 53 18 Which matrices are invertible? ...... 57 19 Invertibility and determinants ...... 62 20 Linear systems revisited ...... 66 Further reading ...... 71 1. Numbers and operations 3

1 Numbers and operations

We begin by setting the terminology for the basic number sets:

f g

the naturals (=positive ): N ,

f g

the integers: Z ,

o n

p

p q

the rationals: Q Z N ,

q

p

p

the reals: R; it is a set larger than Q; it contains e.g. the radicals ( p

q 10 p

p 10

e

) and also other numbers, like π π .

fg

Note that by convention N. The symbol N will denote the set N .

There is one more set of numbers, easily constructed from R:

fa b i a b g the complex numbers: C R . Here two complex numbers are

added and multiplied similarly as polynomials, but further simplification of the

i

i i

result is possible because the new number „ ” satisfies i .

$ $ $ Clearly, we have the following chain of inclusions: N $ Z Q R C.

Example.

1. Here’s how you add two complex numbers:

i i i i i i . skip the brackets collect similar terms

2. And here is multiplication:

i i i i i

use the distributive law

i i i i

distributive law again

i i

. 2

use the relation for i collect similar terms

Analyzing the first example above, it is not hard to produce a general rule for the addition of complex numbers:

Addition Rule in C.

a b i c d i a c b d i

8 .

a b c d R

(The universal quantifier is read „for all”.) A similar formula for the product of two complex numbers holds(try to find it yourself!):

Multiplication Rule in C.

a b i c d i a c b d a d b c i

8 .

a b c d R 4 Algebra and Number Theory. Lecture Notes Szymon Brzostowski

Observe that thanks to using symbols instead of specific numbers, above there are

formulas that work for every choice of (real) values of a b c d. So although it is enough How for you to know how to calculate with complex numbers to arrive at the correct result, Algebra these formulas give you the result almost automatically once you remember them. In works algebra you generally work with symbols and you calculate using a certain set of rules for manipulating the symbols. Some of these rules you should already be familiar with.

The basic rules for computations with numbers are

, and – in symbols –

b b a a (commutativity)

(This law allows you to change the order of the summands.)

, and – in symbols –

b c a b c a (associativity) (This law allows you to skip all brackets when adding – the result is always the

same regardless of which numbers you add first.)

, and – in symbols –

a b c a c b c (distributivity) (This law allows you to distribute multiplication over addition.)

Note that the first two rules above are also true when multiplying numbers (i.e. with

replaced by ). This suggests that perhaps such rules could be interpreted more abstractly. Namely, the symbol (or ) might denote an unspecified binary operation.

Of course, in such generality it might be appropriate to use some new symbols (like ,

, , or ) for the binary operations and sometimes it is done so, especially when

there is a danger of some confusion. Observe however, that in the case of complex numbers that we have considered, we used the ordinary symbols and , rather than some new ones, and we called them „addition” and „multiplication”, respectively. This is natural, because those operations are in fact extensions of the usual operations of addition and multiplication from the real numbers to a wider domain (this means that

the „new” binary operations work in the „old” way for those complex numbers which

i i i i i i are real, e.g. and similarly ). Also

in computer science, a common practice is to implement some new binary operations and still denote them by and although in fact they overload or even totally redefine the standard operations of addition and multiplication.

Example. One can define a truncated subtraction in N by the formula

, if a b

b

a .

b a b a , if

There is everything in order with such definition, but you should feel that the usage

of is a little cumbersome here (after all, we are rather subtracting than adding). In

elementary arithmetic this truncated subtraction is usually denoted by „”. 2. Which numbers are „best”? 5

2 Which numbers are „best”? There is no unique answer to such question, because it depends on the properties you are interested in. For example, testing associativity and commutativity gives N Z Q R C Ass. YES YES YES YES YES + Com. YES YES YES YES YES , · Ass. YES YES YES YES YES Com. YES YES YES YES YES so in these respects there is no apparent difference between N, Z, Q, R and C. Actually, in principle, in order to verify that the entries in the above table are correct, it would be enough to check these properties for C because then they are automatically valid for

the smaller sets N, Z, Q and R. In practice, this is done the other way around because

of the order in which the number sets are actually constructed (N Z Q R C).

As an example of such verification, try: 

Exercise 1. Check that + and are associative and commutative in C (assuming you know this 

for R!) using the formal definitions of + and given on page 3. Commentary. Proving the usual empirical properties of the operations and is not

totally trivial, even for N. First, one needs to properly define the set N (this can be done using Set Theory) and the operations and . Only then is it possible to actually prove their properties (this is done in a branch of called Theoretical Arithmetic). Fortunately, you have so much empirical evidence for their validity from everyday life, you should not feel uncomfortably using those properties. Going back to our question again: are the various sets of numbers essentially

different? Yes, they are... But only when you start thinking about solving equations

x in them. Suppose we want to solve the equation in the naturals. Clearly, this

is impossible because we have

x N

a result always greater than !

x

But the same equation has a (unique!) solution x in Z.A difference! The following table shows which equations can be solved in the various number sets:

Equation to solve N Z Q R C

x b

a NO YES YES YES YES

x b a

a NO NO YES YES YES . (1)

x NO NO NO YES YES

x NO NO NO NO YES

x

As we can see, Q is different than Z if considered with multiplication (e.g.

x

has no solutions in Z; on the other hand the equation does have the solution

x Q C 10 in ). Also, is „better” than the other sets with respect to . But what is the difference between Q and R? Part of the answer is shown in the above table. The full answer is beyond the scope of Algebra: the real difference is that in R there exist „limits” for all the sequences that „should have” a limit. The same property is true for C. Hence, R and C have different analytic features than Q.

6 Algebra and Number Theory. Lecture Notes Szymon Brzostowski

Q

2 2 2 2 2

Example. The sequence 2 has no limit in but it does have a

2 2

π π

R

2 2

limit (or converges) in to the number . This can be written as 2 .

Table (1) may seem to suggest that the complex numbers are „the best”. This is, indeed, true in many respects but not in all of them. For example, the numbers N, Z,

Q, R are linearly ordered, thus are comparable and can be drawn on a line:

p

π

a b

More importantly, this ordering is compatible with the , operations:

a b a c b c 8 c b c 8

a and . Such ordering is not possible for

c a b c

a b c R R C. Still, there exists a nice way to visualize complex numbers – they can be drawn on

a plane.

i

i

The complex number is plotted

as the point (vector) on the plane

i

R . Similarly, the real number

is plotted as the point (vector) in

R . Such visualization is called Argand Diagram.

Argand Diagram. Under this visualization, the operations and on complex numbers can also be inter- preted graphically. For now, we content ourselves with the interpretation of addition,

multiplication being more involved.

z z

z

z

The sum of two complex numbers

a b i z c d i

and is the complex

z z a c b d i number . Hence, the addition of complex numbers is just

the addition of vectors in the plane R z according to the „parallelogram law”. Addition of complex numbers. On the other hand, are the natural numbers N „worst”? No, because they have

another advantage: besides being ordered they are well-ordered, which means that

? every subset A of N has a unique minimal element. This allows one to prove the- orems concerning N using a technique called mathematical induction. Moreover, the naturals (more generally: the integers) have fascinating arithmetic properties connected

with the notion of divisibility.

f

Example (the well-ordering principle of N). The set A the natural numbers

g f g A that are divisible by and 10 20 30 has its minimum equal to min 10. We can sum up the above discussion by agreeing that there are various possible viewpoints on the properties of numbers; hence there is no obvious choice of „the best numbers”. The various viewpoints are developed in different branches of mathematics: 3.Thefirstabstraction–theconceptofagroup 7

Algebra

(properties of binary & .% operations) Number Theory Linear Algebra (divisibility and prime (vectors, matrices and numbers) operations on them) The lectures will cover the very basics of these three branches of mathematics.

3 The first abstraction – the concept of a group

As we saw, the sets of numbers have different properties with respect to solving

x b equations of the form a , where is a binary operation ( or ). Moreover, the

operation itself often has some „nice” features (associativity, commutativity). Such observations have led to an abstract notion of a group which allows one to describe this

type of phenomena in a unified way.

G G

Definition 1. (Group) Let G be a set together with a binary operation

G G . We call G a group (more formally: a group is the ordered pair ) if the following axioms are satisfied:

a b c a b c

G1 . 8 (associativity)

a b c G

! 8a e e a a

G2 . 9 (existence of identity element)

e G a G

9a b b a e

G3 . 8 (existence of inverse element).

a G b G

G

The element e involved in and G3 is one and the same and is called the

G a

identity, while the element b satisfying G3 is called the inverse element of

G a a a a e and denoted by a . Hence, G3 can be written in the form . If moreover

a b b a

G4 . 8 (commutativity)

a b G

G

then the group is called commutative or abelian.

(The existential quantifier is read „there exists”. The symbol means „there exists

exactly one”.)

8a  e =

Exercise 2. Show that the condition G2 is equivalent to the condition G2 9

e G a G

e 2 G  a = a

e ; in another words, an element satisfying G2 is the element.

1

a 2 G a 2 G

Exercise 3. Show that in a group G the element inverse to an element is indeed

1

the element, that is show that a is unique.

We want to emphasize that the symbol in the definition of a group denotes an

abstract binary operation. Since, however, it resembles the ordinary multiplication,

sometimes one says that the group G is in multiplicative notation. In this case, the dot

ab e G

is usually omitted so that ab means and the identity is denoted by . Similarly,

G

for a group one may say that it is in additive notation. In this case, one uses

a a G the symbol for the inverse element to , and the identity is often denoted by . 8 Algebra and Number Theory. Lecture Notes Szymon Brzostowski

At first sight, the definition of a group does not seem to be connected with the

b a b G equations a x , . But it turns out that in a group such equations are always

solvable. We list this together with some other basic properties of groups:

G

Property 1. Let be a group. Then

a a ab b a

1. 8

a b G

a c b c a b c a c b a b

2. 8 (cancellation laws)

a b c G

9!a c b 8 9!d a b a x b x a b

3. 8 and (solvability of and ).

a b G c G a b G d G

Commentary. Property 1 works for all groups; in particular it is valid for the example

groups given below. This is a manifestation of the power of abstract thinking. So far, we have not spoken explicitly about the division „” and subtraction „ ”

operators. The reason for that is that in a group they can be defined using the group

8 a b a b

operation (or ). Namely, in the additive case we put and in

a b G

a

8 ab a b

the multiplicative case — b .

a b G

Examples of groups. (Some of these will be discussed in greater detail at the classes.)

fg I. C R Q Z are abelian groups with respect to the

usual addition.

fg II. C R Q , where the asterisk means we exclude the number

, are abelian groups with respect to the usual multiplication(check this for C!).

III. R Q , where R , Q Q R , are also abelian groups.

? f IV. Let X . The set of functions X X that are bijections (that is functions

which are one-to-one and onto) with the operation of composition of func-

g 8f gx

tions is a group. Explicitly: f X X is a function defined as

x X

gx

f . We will denote this (usually non-abelian!) group by Bij X .

f ng n

V. If X , where N, then the elements of Bij X are called permuta-

S S n tions. The set of such permutations is denoted by n. By example IV, is a group with respect to composition of functions. This group is called the

of degree n.

k k

π π

i k n n n

VI. The set n cos sin , where N, consists of -th

n n

n

roots of unity that is for every n we have . With respect to

z

n factors

(complex) multiplication, n is an .

VII. The circle S C with center and radius consists of the complex numbers of

i S the form z cos sin , where π . is an abelian group with respect to the complex multiplication. (This, as well as example VI, follows from the

formula for the product of two complex numbers written in their trigonometric

i i i form: cos sin cos sin cos sin ). 4. ...Divisibility in Z 9

X X

M

VIII. Let X be any set. The pair , where is the set of all the subsets of

AMB

the set X and denotes the symmetric difference of sets that is

A B n A B , constitutes an abelian group.

B n

IX. Consider the set n of binary strings of length . We may define a bitwise oper- ation on such strings using the exclusive or (xor) operation. Namely, for bits we put:

Input xor

s s s t t t

n n

so that e.g. xor and so on, and for two binary strings ,

t s t s t

B s

n

n n

we put: xor xor xor (e.g. 01xor11 xor xor

B

10). The ordered pair n xor is a group (check it!).

The last example turns out to be a very special case of a more general phenomenon, having its source in Number Theory. Namely, this is an instance of modular arith- metic... (see Section 7 for an explanation of this connection)

4 ...Divisibility in Z

The basic definition is:

l k l j k k l

Definition 2. Let k l Z. We say that divides , and we write or mod

l m

(read: „ k is congruent to modulo ”), if there exists an Z such that

l k k l

m k

l . We may also say that is a divisor of and that is a multiple of .

j

Example. because . l What happens if an integer k is not divisible by an integer ? We may perform a

„partial division”:

l l Theorem 1. (Division with remainder) Let k Z . Then there exist

(unique!) integers q r such that

q l r r jlj

k and (2)

k l q is called the partial quotient and r – the remainder ( on division of by ).

(In the above theorem jj denotes the modulus (or the absolute value) of a real number

which is equal to its distance from .)

-

Example. Clearly, ( „ does not divide ”). But we have

q r

jj

and . Similarly, if we want to divide by we get with

q r

.

10 Algebra and Number Theory. Lecture Notes Szymon Brzostowski

r k l

Notation. If r is the remainder as in ( 2) we will write mod .

l f jlj g

Note that by the definition of remainder we always have k mod .

l

f l g

Hence, for a natural we define the set Zl and we may think of l

it as the set of remainders modulo . The amazing thing about Zl is that there exist

mn natural operations in it. Suppose we want to add two numbers Zl . What should

the result be like? Well, it should be a number, but it should also belong to the set Zl .

n l

There is no problem with it if m — we can safely take this as the correct result

m n l l

in Zl. But what if ? Recall that Z is the set of remainders, so perhaps we

mn l should compute the remainder mod ... This turns out to be the correct guess. We pose the following:

Definition 3. (Modular addition and multiplication)

l m n m n

l mod

8 .

m n Z

m n m n l l

l mod

l l l Clearly, we defined two operations l Z Z . What are their properties?

Theorem 2. (The of Zl)

l l

1. Zl is an abelian group with the number Z as the identity.

l

2. l is associative and commutative and has Z as the identity.

8 a b c a c b c

l l l l l l

3. is distributive over that is l .

a b c Z l Let us illustrate these properties with some specific examples.

Example.

mod mod ; hence is the -inverse element to and

this fact can be written as (cf. the comments on page 7 following Definition 1)

a

a l a l l

(in Z ); more generally, for Z it holds mod .

mod mod and

l

mod (an example of associativity of )

20mod 12mod and

l

30mod 12mod (an example of associativity of )

mod mod and

mod 12mod 10mod (an example of

the distributive property)

Convention. To simplify the notation, we may write just and in place of l and

l

l (respectively) upon remarking that the calculations are performed in Z .

l l

Theorem 2 shows that the triple Zl constitutes a quite nice algebraic struc-

ture. Note also that those nice properties seem to be inherited from Z , where they

also hold. More generally, every triple Q , R , C has such properties, too (check this for C!). Thus we have come to our... 5....Secondabstraction–theconceptofaring 11

5 ...Second abstraction – the concept of a ring

R RR

Definition 4. (Ring) Let R be a set with two operations . We say

R

that R (or more precisely: the triple ) is a ring (with unity) if

R 1. is an abelian group (see Definition 1) with identity denoted by

2. is associative

R

3. there exists an identity element for

4. is left and right distributive over , that is

r s t r s r t

8

r s t R

s t r s r t r R

If, moreover, is commutative we say that is a commutative ring (with unity).

Convention. It is customary to agree that has greater priority than . This allows

a b c a c b c

one to skip some brackets. For example, we can simply write b c

because we understand that the products a c, should be computed first, and only

c b c after that may we compute the value of a by performing the addition.

Equipped with this new notion, we can restate Theorem 2 as:

l

Theorem 2’. (The algebraic structure of Zl) For every l N, , the triple

l l Zl is a commutative ring with unity.

We have already mentioned (implicitly) some examples of rings at the end of the previous section. Explicitly:

Example.

I. Z , Q , R , C are commutative rings with unity.

X X

M M

II. By Example VIII page 9, is an abelian group. One can check that

constitutes a commutative ring with unity (what are its identity elements?).

RX R

III. If R is a ring then the set of polynomials with coefficients in (that

n

r r X r X r r R

n n

is „formal sums” of the form where ,

n

N ), together with the natural operations of addition and multiplication of

RX polynomials, is a ring (commutative if R is commutative) .

X

? R R f R IV. Let X and let be a ring. The set of functions X is itself a ring

if considered with the following (natural) operations:

f g 8f gx fx gx 1. 8 is a function defined as ,

X

x X

f g R

addition in R

f g 8f gx fx gx 2. 8 is a function defined as .

X

x X

f g R

multiplication in R

X

R Moreover, if R is commutative then also is a commutative ring. 12 Algebra and Number Theory. Lecture Notes Szymon Brzostowski

6 The concept of an isomorphism

X

Looking at the last example on page 11, let us notice that the ring R embeds

X

R x r the ring R in a natural way. Namely, in there are constant functions X

X

R R and one can identify the elements of R with such functions in . This type of identifications are made throughout algebra. Upon such identification, one can just

X X

R R R write R as if were indeed a subset of (although, strictly speaking, it is not). The reason why this does not lead to a confusion is that the set of constant functions

X R

in R is an isomorphic copy of the ring (see definition below), so one can calculate

with these constant functions in the same way as in R. If, for example, R Z then

8x x x x

we have: , where is the constant function sending

x X

x x every x X to Z and similarly for , . Let us give an intuitive definition of the concept of an isomorphism, which is central

in the whole of algebra.

Definition 5. (Isomorphism) Let A , B be two algebraic

structures with the same number of binary operations and some distinguished

elements A, B. We say that the structures A and B are isomorphic

if there exists a bijection A B such that

a b a b a b a b

1. 8 ,

a b A

2. .

The bijection is then called an isomorphism ( of the structures A and B). The

A B A B A B notation or will indicate that and are isomorphic (by means of

the isomorphism ).

If we do not require to be a bijection (so it is just a function from A to B) but the conditions 1 and 2 are satisfied for , we call such a a homomorphism of the structures A and B.

How does this general definition specialize to groups? We give

G H

Definition 6. (Isomorphism of groups) Let , be two groups. We say

H GH

that G is homomorphic to if there exists such that

8 a b a b

a b G

this is called a (group) homomorphism.

If, moreover, is a bijection, it is called an isomorphism ( of groups) and the

H H G H G G groups and are said to be isomorphic. In symbols: or .

The question is: why in this definition do we not require to satisfy condition 2

of the general Definition 5? The answer is simple: in case of groups this condition is

e G e H automatically satisfied for a . Namely, if and

are the identities (=„the distinguished elements”) of the respective groups, we have:

e e e e e e e H

and by Property 1 item 2 applied to the group

e e this implies that , which is the missing condition. 6. The concept of an isomorphism 13

In general however, such good behaviour cannot be taken for granted. For example,

k k

we may define Z Z by the formula for Z. It is immediate to

k l k l k l k l k l notice that and for all integers , so

one might be tempted to treat as a homomorphism of rings. This is ok if one considers

rings without unity (sometimes it is done so), but according to our definition (Definition

4) a ring has got the distinguished element so its homomorphism should carry to .

Thus, in case of rings (with unity) we must explicitly demand this condition:

R S

Definition 7. (Isomorphism of rings) Let , 6 be two rings. We

S RS

say that R is homomorphic to if there exists such that

a b a b a b a b

1. 8 6 ,

a b R 2. ;

this is called a (ring) homomorphism.

If, moreover, is a bijection, it is called an isomorphism ( of rings) and the

S S R S R R

rings and are said to be isomorphic. In symbols: or .

Again, here the condition is automatic since Definition 7 implies that this ring

R S homomorphism is also a homomorphism of the additive groups and 6 in the sense of Definition 6.

Examples (check!).

R x x

1. For a group G (or a ring ) the identity function Id defined by Id , for

G x R

x ( ), is always an isomorphism of groups (rings).

f k k g

2. The group Z is isomorphic to the group Z , where Z Z

x x x

is the set of even numbers. The isomorphism can be specified as , Z.

3. For the group R one can define an automorphism (i.e. an isomorphism of

x x x

a group (a ring,...) with itself ) by the formula , for R.

z z z

4. Similarly, the function given by , C, (here the bar denotes complex

conjugation) is an automorphism of the ring C .

5. For the group Q one can define its homomorphism into the group Q as

x jxj x fq q g

, Q (here Q Q ). Another possible homomorphism

x x x

of this type is , Q .

l x x l x

6 . Let l N, . The function given by mod , Z, is

l l

a homomorphism of the rings Z and Zl .

x

x x 7. The function R R given by , R, is a group isomor-

phism.

RX

8. For a polynomial ring (see Example III on page 11) and a chosen ele-

r R w wr r

ment one can define the evaluation homomorphism r by ,

RX R

w , into the ring .

n

n n

9. The group of -th roots of unity is isomorphic to the group Zn .

k k

π π

i k

The isomorphism can be given by the formula cos sin , for

n n

f n g k . 14 Algebra and Number Theory. Lecture Notes Szymon Brzostowski

Let us use the last example as an illustration of the concept of an isomorphism.

f i ig

Consider . In this case, the isomorphism of example 9 above works as

follows:

Z

i

i

Under this identification, let as compare the multiplication (addition) tables of the

respective groups:

i i

i i

i i i

i i

i i i

This direct inspection shows that the above tables are essentially identical. Namely,

i

let us try to forget about the actual values represented by the various numbers (

8 8 8 8

i

) and try to think of them as names ( ) and similarly we may

think about an abstract operation instead of and . Better yet, let us introduce

new names, e.g. a b c d. Then the above two tables can be written in a unified way:

a b c d

a a b c d

b b c d a

c c d a b

d d a b c

fa b c dg This table represents a group much in a way Algebra actually

treats the groups it studies – this representation is more abstract than the two specific

b i

representations above. Seeing this amounts to noting that the substitutions a

c d i

and lead to the table of the group while the substitutions

a b c d

and lead to the table of the group Z (see above). Thus, the concept of an isomorphism gives us a way to say that two given groups (rings,...) are just instances of one and the same abstract group (ring,...). This „abstract structure” is what Algebra is actually interested in, because it can be studied without contemplating „the nature of its elements” and the real interest, which lies in the properties of the binary operation itself, is more transparently exposed. There is a close analogy between this point of view on groups and object-ori- ented programming – specific groups are instances of an „abstract group” just the way specific objects are instances of the class they instantiate. Keeping this analogy, one might also say that Algebra is concerned only with the properties of the classes, not the objects themselves. 7. The direct sum 15

To illustrate with what sort of properties Algebra is interested in, let as define the R

notion o a zero-divisor in a ring: an element r of a ring is called a (left) zero-divisor

R s r s r R R

if there exists s , , such that . Let be a zero-divisor in a ring and

R R r s s R r s

let . We have for some element . Hence,

r s

. By the comments after Definition 7 we know that which means that

r s s s

we get the relation . Since is one-to-one and , also . By

r r R

the above definition this means that the element (identified with by means

R of the isomorphism ) is also a zero-divisor in the ring . Thus the property of „being a zero-divisor in a ring” is an invariant of . Of course, one may similarly define and analyze the notion of a right zero-divisor. In the following „a zero-divisor” will stand for a left or right (or maybe simultaneously left and right) zero-divisor.

We summarize some basic properties of morphisms, often used in practice:

GH H K

Property 2. Let , be homomorphisms of groups (rings). Then:

1. is also a homomorphism,

2. is an isomorphism if and are isomorphisms,

3. if is an isomorphism then also is an isomorphism,

G G G 4. the set Aut of automorphisms is a group with respect to com-

position of maps ,

8 x x

5. if G is a group then .

x G

7 The direct sum

Let us go back to example IX on page 9. The table for the xor operation on bits

can be compared against the following table:

Input

The immediate conclusion is that we have Z xor Z i.e. the operations xor

and are the same in Z . Moreover, we extended this xor to binary strings of

length n, bitwise. Whatever „a string” means to you, you should agree that working

with binary strings in a bitwise way is the same (=the resulting algebraic structures are

n k k

n

isomorphic) as working with -tuples Z Z componentwise, e.g.

z

n factors

16 Algebra and Number Theory. Lecture Notes Szymon Brzostowski

There is a concept generalizing this idea.

G H

Definition 8. (Direct sum of groups) Let and be two groups.

H a b a G b H

Consider the set G of ordered pairs of the form , where , . In

H

G we define a new operation by the formula

a b c d a c b d

G H G H

Then is a group called the direct sum of the groups and . This

H

group is denoted by the symbol G .

G e H

If e are the identity elements of the respective groups then the

H e e a b a b

identity of the group G is equal to . Also, we have ,

a b a G b H G where (resp. ) is the inverse element to (resp. ) in the group

(resp. H). Commentary. The above definition can be extended to rings and other algebraic struc-

tures in an obvious way, and also to an arbitrary number n of such structures by consid-

ering n-tuples instead of pairs.

If the operations are denoted multiplicatively (e.g. ) then sometimes one speaks

H about of groups instead. The direct product is denoted by G . In fact, for our applications those notions are one and the same, and we may use either terminology, whichever seems to be more convenient.

Example.

n n

We can construct the direct sum Zn R of Z and R : its elements are

k r

the pairs Zn R and

k r l s k l r s

n

kr k n k

Here, the identity is equal to e and mod because

r

k k e r k n n r

mod n mod .

r r

n n Another example of a direct sum is just the -dimensional space R with coordi-

natewise addition of real numbers (here becomes the usual addition of vectors).

H G H Is there any relation between the groups G, and their direct sum ? Well, note that usually when we think e.g. about the plane R, we think of it as embedding the line R, as in the following picture: R R

R

g

Hence, the real numbers are identified with the subset R f of R . A similar idea

G H works in the general case. If and are groups with identities equal to ,

(respectively), we may define the following maps:

G H G x x H y y G H H G and

8.Thethirdabstraction–theconceptofafield 17

H

It is straightforward to check that G, are injective (=one-to-one) homomorphisms;

GH G Gfg H fg H H

hence in there are isomorphic copies G and of the H

groups G, , respectively. As explained earlier, from the point of view of Algebra we

G H H G H may thus think that G and . It is worth to note that similar identi- fications are made in other branches of mathematics, for example in Analysis one also often treats R as a subset of Rn .

8 The third abstraction – the concept of a field

A careful reader might have noticed that so far we have not spoken anything about

R a x b a b R

the situation when in a ring not only is the equation , , solvable

x b a b R a R but also the other equation a , , , has a solution in (this is the case for the rings Q, R, C – see table (1)). To set the terminology, we give now the missing

definition.

R

Definition 9. (Field) Let be a ring with unity such that .

R a b R

1. An element a , , is called invertible if there exists such that

b b a b

a and . In this situation the element is called the inverse

a

element of the element a and is denoted by or .

a

R R 2. If every non-zero element a is invertible then the ring is called a skew

field or a division ring. R 3. If a skew field R is also a commutative ring then is called a field.

Examples.

The ring Z is not a field because the only -invertible elements are and

.

As remarked above, Q, R, C are fields with the usual addition and multiplication.

In R one can introduce the structure of a skew field obtaining the set H of

quaternions which contains C as a subset.

f g

In Z we have so is invertible (this holds in any ring). Hence

Z is a (finite) field.

f g

In Z we have of course and moreover . Hence also

Z is a field.

f g

In Z we have , but the element is not invertible

(check!). Hence Z is not a field!

In the last example a direct inspection showed that does not have a multiplicative

inverse in Z. Even worse, is in fact a zero-divisor in this ring because . For

n larger rings Zn (i.e. for big N) such verification could be very tedious (or even com-

putationally infeasible) without some additional knowledge. For instance, checking if an element a Z1000 is invertible requires (potentially) the computation of 999 products. Hence, our next medium-term objective is to characterize the invertible elements in

Zn and learn how to compute the inverses. This will be done in Section 11, because to accomplish this we need some new tools. But first, let us note a necessary condition for invertibility, suggested by the last example above.

18 Algebra and Number Theory. Lecture Notes Szymon Brzostowski

a a j n a n Property 3. If Zn is an integer such that then is a zero-divisor in Z and has no multiplicative inverse. More generally, in any ring a zero-divisor is not invertible.

The notion of an isomorphism in case of fields is essentially the same as the one for rings. Since, however, fields have „better” multiplicative structure than rings, one can

state a seemingly weaker definition:

Definition 10. (Isomorphism of fields) Let K , L 6 be two fields. We

say that K is homomorphic to L if there exists K L such that (i.e. is

not identically equal to ) and

a b a b a b a b

8 6

a b K

this is called a (field) homomorphism (or an embedding).

If, moreover, is a bijection, it is called an isomorphism ( of fields) and the

fields K and L are said to be isomorphic. In symbols: K L or K L.

In order to explain why this definition gives an isomorphism in the sense of the general

Definition 5 we note the following: since of Definition 10 satisfies , we can choose

x x x x

x K such that and then we get which multiplied

by gives . This is all that we need, because the relation holds for

x the same reasons as in the case of ring homomorphisms. But there is more to the story:

Property 4. A field has got no zero-divisors other than (=it is an integral domain).

Every field homomorphism is injective (hence the name embedding ). Note that the latter property means that in order to check if a field homomorphism is in fact field isomorphism, it is enough to check if it is „onto”.

Property 3 explained that an invertible element of Zn cannot divide the number

n - . But this is not enough, as it turns out (e.g. in Z and ). The invertible

elements of Zn turn out to be exactly the non-zero-divisors. We will see this shortly, but first let us change our language a little...

9 Congruences

The concept of „calculating modulo n” can be extended to the whole set of integers,

which has proved to be quite useful in applications. First we give the basic definition. l

Definition 11. (Congruent numbers) Two integers k, are said to be congruent

n k l n n j k l

modulo n, where N, if mod (see Definition 2) that is if .

k l n k l n k l

For and congruent modulo we will write mod or n .

n n j k k Note that in particular k mod (in the new sense) iff , so the new

notation is consistent with the notation introduced in Definition 2.

j Example. 10 26 mod because 26 10 16.

9. Congruences 19

n The above definition is in fact a definition of a relation n in Z. Although is

not an ordinary equality, it shares with it many nice properties:

≡ n

Theorem 3. (On the relation n) For every N, n is an equivalence relation

in Z that is

k k k l l k k l l m k m 8 8 8

n n n n n

1 n 2 3

k l k l m

k Z Z Z

Even more importantly, this equivalence relation is compatible with the usual oper-

ations in Z:

Theorem 4. (Arithmetic properties of congruences) Let n N. Then

8 a bc dac bd

n n

1. n (sidewise addition of congruences)

a b c d Z

8 a bc dac bd

n n

2. n (sidewise multiplication of congruences)

a b c d Z

Example. 10 26 mod and 11 mod so 10 26 11 37. Also

10 26 11.

k k k n k n

Note also that for any integer it is n mod so is congruent modulo to its

k l k l k l k l 8 8

n

n n n

remainder on division by . In particular, n and

k l

k l Z Z

n n (cf. Definition 3). One way to think about congruences is that we allow ourselves to go

out of the ring Zn and calculate more „freely” with integers at the cost of not having

n strict equalities (which we had in Z ) but only relations ( n in Z). But note that at

the end of such calculations we can always produce a result that lies in Zn (by the distinguished formulas above) and this, in fact, will be the correct result in this ring (if

the arguments to the calculations also came from Zn).

Example. In Z15 we have

10 15 15 13 10 15 12 15 13 15 13 . On the other hand, we can perform the ordinary addition and multiplication and only

reduce the final result to its remainder:

10 13 13 91 mod15 . As we can see, the result is the same. Illustrative as it is, the above example is somewhat misleading because the real power of calculating with congruences comes from the exact opposite application of Theorem 4: namely, one can always reduce any term of the expression to compute to

its remainder and still have a valid congruence.

Example. Let’s say we want to compute the remainder 134 21 16 mod . We

know that 16 mod , 134 mod , 21 mod . Hence, by Theorem 4 item 2

we have 134 21 and then by item 1 — 134 21 16 . Now it is

easy to finish: 10 mod . And this is the answer to our problem.

If you think of what has just happened, you may feel a little amazed because in

order to find the result we didn’t need to compute the product 134 21 at all! 20 Algebra and Number Theory. Lecture Notes Szymon Brzostowski

Actually, such calculations can be performed in a much more flexible way than done

above (again, this follows from Theorem 4). For instance, we can also do something like

this: 134 21 16 21 34 42 34 mod , where the decorated

numbers are (respectively) congruent modulo .

At the level of congruences one can study „equations” similar to those considered

in Zn, too:

a x b a x b n

n and .

a b a x b

The first congruence is easy to solve. For given Z all the solutions of n

fb a k n k g

are the integers x Z . Of course, among such numbers there is a solu-

b a

b a n k bc

tion equal to mod Zn (for where denotes the floor function).

n

a b a x b a x b

n n

If, moreover, Zn then solving in integers amounts to solving in

x b a

n n

Zn and then using the (unique!) solution (see page 8 for the definition of )

fb a k n k g a x b n to produce all the integer solutions n Z to the congruence .

These observations indicate that there is no essential difference between the ability to

a x b a x b n

solve n and mod but the latter is „more general” than the former a b

in that that we allow here to be arbitrary integers (not just the elements of Zn).

a x b n

Similarly, we will be able to solve the equation n in Z once we can solve the

a x b congruence n in integers. The tool that is needed to solve such congruences will be developed in the next section.

10 GCD

The basic definitions that we now give should be more or less familiar to you:

k k d r

Definition 12. (GCD) Let Z. A natural number is called the greatest

k k r

common divisor (gcd) of the numbers if

8 d j k

1. j

6 j 6 r

8 j k j d

2. if Z is such that j then .

6 j 6 r

d k k r

The above number is denoted by gcd or – if no confusion is likely –

k k r

simply by .

k l

Definition 13. (Coprime numbers) Two integers k l are called coprime if

k l . Coprime numbers are sometimes denoted by .

The next theorem should not be a surprise to you:

k k r Theorem 5. (On GCD) For any Z not all equal to zero their gcd always exists and is uniquely determined.

In principle, Theorem 5 is a consequence of the following method of finding gcd:

T

k k k

D D

j j

find the sets j of all the positive divisors of the numbers , compute

6 j 6 r

k

and take the maximum of this (finite! if not all j are ) set. This number will be the

k k r gcd . 10. GCD 21

Examples.

k k jkj jkj k

For any Z we have gcd because is the greatest divisor of .

f g f g

because D , D so that max D D

f g max .

The method for computing gcd indicated above is not very practical, because it involves factorization of integers (to find the divisors) and the problem of factorizing integers is widely considered to be quite difficult computationally (and supposedly infea- sible for large numbers). The popular public-key cryptosystem RSA is based on the problem of factorization and so-called Euler’s theorem (see Section 13). Fortunately,

there is another, very old, method to compute gcd without factoring integers: Theorem 6. (Euclidean algorithm) The gcd of two numbers k l Z, of which

at least one is different from zero, can be computed in the following way:

r jkj jlj r jkj jlj

1. Put max , min .

r r i r

i i

2. If are already defined for some number and then put

r r r

i i

i mod .

a r

With such definitions, there exists such that a (so the above process

r k l

ends). Moreover, then a gcd .

r r a Commentary. The sequence in the above theorem is defined using mathe- matical induction. This, in turn, can be implemented recursively e.g. as follows (using Maple language):

GCD:=proc(k,l); # the hash mark "#" is a comment sign if k<0 or l<0 then return GCD(abs(k),abs(l)) fi; if l<>0 then return GCD(l,k mod l) fi; # "<>" means "=/" if l=0 and k<>0 then return k fi; ERROR(‘At least one of the numbers has to be different from 0‘)

end proc;

r r r r r

Example. gcd 224 128 We have 224, 128, mod 224 mod128

r r r r r r

96, mod 128 mod96 32, mod 96 mod32 . By Theorem 6,

r

224 128 32 The above calculations can be represented in a both more compact and more detailed

way:

r r q

i i i

224 128

128 96 .

96 32

32

j k

r

i

q

Here i . This representation will be useful in what follows.

r

i + 1 22 Algebra and Number Theory. Lecture Notes Szymon Brzostowski

The Euclidean algorithm is a much more effective way of computation of gcd than

the naïve method based on factorization of numbers indicated above. Specifically, one

k l

can show that the number of steps required to perform in order to compute gcd

l k O l O l

for two integers k l such that is log (read: „...is big- of log ”; this notation

l C

means that the number of steps is bounded by C log for some positive constant R).

k l From this it also follows that the running time of Euclidean algorithm is Olen len ,

where len denotes the binary length (that is the number of bits) of an integer. It is

O k l worth noting that the time needed to compute the product k l is also len len so

finding gcd is essentially as fast as multiplying integers.



jk jc b k =/ 0

log 2 , if

k k =

Exercise 4. Show that the binary length len k of an integer is equal to len .

k = 0 1, if Theorem 6 gives us a method of finding the gcd of two integers without knowing

their factorizations. And what about the case of several numbers? We have:

k k r

Theorem 7. (GCD of several integers) Let Z be integers not all equal

k k k k

r r

to . Then does not depend on the order of the arguments .

k k fk k g r

r r

More precisely, depends only on the set . If and also

k k

r

are not all equal to then

k k k k k

r r r .

Remark. Note that the properties of gcd given in Theorem 7 resemble the properties of

binary operations in groups (commutativity and associativity) but, generally speaking,

n

gcd is an n-ary operation on integers ( ) and such terminology is not used in this

n k k k k k k k k k k k

case. However, for we have and

k k k k k k k k

so one can say that the binary gcd, gcd Z Z Z ,

is commutative and associative. Question: is Z gcd a group?

Example. Using Theorem 7 and the example on page 21 we compute:

128 224 40 224 224 128 40 224 128 40 32 40

g f

(note that in the first equality we used the fact that f128 224 40 224 224

g 128 40 ).

Commentary. The Maple code of Commentary on page 21 can be modified to a func- tion gcd accepting an arbitrary number ( ) of arguments. A simple (although not very elegant) such modification is the following: GCD:=proc() local A,k,l; if nargs=0 then ERROR(‘You must specify at least one argument‘) fi; #"nargs" is the number of arguments actually passed to the procedure if nargs=1 then return GCD(abs(args[1]),0) fi;#"args[i]" is the i- th argument passed if nargs>=3 then A:={args} minus {0};#we remove 0 from the set of arguments passed if A={} then ERROR(‘At least one of the numbers has to be different from 0‘) fi; #A has to be a non-empty set

a x b a x b n 11. Solving n and 23

return GCD(GCD(op(1..-2,A)),A[-1]);#"op(1..-2,A)" is the (comma separeted) sequence of all the elements of the set A except for the last one; A[-1] is the last element fi; #now nargs=2 and we proceed as before: k:=args[1]; l:=args[2]; if k<0 or l<0 then return GCD(abs(k),abs(l)) fi; if l<>0 then return GCD(l,k mod l) fi; if l=0 and k<>0 then return k fi; ERROR(‘At least one of the numbers has to be different from 0‘) end proc;

Exercise 5. Implement and test the above procedure in Maple. Example invocation: GCD(8,0, 14).

11 Solving a · x ≡n b and a ·n x = b

a x b

What is the connection between the gcd and the equation n ? The following theorem provides a characterization of gcd which turns out to be the link that we search

for.

a x a x b a

r r

Theorem 8. (Bezout’s identity) The equation , where

a b a a j b

r r

Z are not all equal to and Z, has a solution in integers iff .

r a x a x b

If and Z is a solution of then all solutions of this

a a

2 1

x x k k k

equation are exactly of the form , where Z

a a a a

1 2 1 2 is an arbitrary integer. Remark. In Number Theory equations that are to be solved in integers are called

Diophantine equations.

a x b

How does the above theorem help to deal with the congruence n ? Well, recall

a x b n j a x b 9a x n y b

first what such congruence actually means: n .

y Z

a x b a x n y b

Hence, in order to solve n it is enough to solve in Z , and by Theorem

a n j b 8 we know when this is possible – exactly when . Before actually showing how to solve the latter equation, let us finally answer the question touched upon in Property 3.

Corollary 1.

ax b ab n anjb

1. The congruence n , where Z, N, has a solution in Z iff .

In particular,

ax a n an

2. The congruence n , where Z, N, has a solution in Z iff , n

or in words: iff the numbers a and are coprime.

a a n

n n n

3. An element Zn has a multiplicative inverse in Z iff .

n

a a n n fg n

Commentary. Note that if for some Zn it is then Z and

a n

n a

a n a

n n n

n which means that is a zero-divisor in Z . In another words, an

a n a n

a n element Zn is -invertible iff it is a non-zero-divisor (by item 3 of the above corollary and Property 3). 24 Algebra and Number Theory. Lecture Notes Szymon Brzostowski

Using Corollary 1 we can characterize those Zn that bear a field structure. For this, we recall the following definition:

Definition 14. (Primes) A natural number n is called prime if it has exactly

two positive divisors. The set of all prime numbers will be denoted by P. Hence,

f g P 11 13 . A natural number greater than which is not prime is called a composite number.

Now we can state:

n

n n

Corollary 2. Zn is a field iff is a prime number.

The above corollary validates our empirical discovery that Z , Z are fields (see examples

on page 17). Now we know that also Z, Z , Z11,... are fields and that for a composite n

number , Zn is not a field.

b y c Let us indicate a method of solving the equation a x in integers. We will do this by example.

Example. (Extended Euclidean Algorithm)

x y

Consider the equation 224 128 64. By the example on page 21 we know

j that 224 128 32 and 32 64. Hence, Theorem 8 asserts that our equation does have

some solutions in Z . Let us recall the table from the mentioned example:

r r q

i i i

224 128

128 96 ,

96 32

32

k j

r

i

q r r

i i

where i is the partial quotient on division of by . Using this relation

r

i + 1

and recalling the Division with Remainder Theorem (Theorem 1), we see that 128

96 32 (look at the second and third rows from the bottom) which can be rewritten

as 32 128 96. Similarly, looking at the third and fourth rows from the bottom we

get 96 224 128. Substituting the latter relation into the former gives

32 128 96 128 224 128 128 224 128 224 128 (3)

u

v

0

0

Note that we have reached the top of the table and thus we have expressed 32 224

u v

128 in terms of the numbers 224 and 128. Specifically, (3) means that

v is one of the possible solutions to the auxiliary equation 224 u 128 32. Multiplying

64

equality (3) sidewise by 32 we get 64 224 128 which can be written as

x y

224 128 64 This means that is one of the integer solutions to

x y the equation 224 128 64 that we wanted to solve. Now we can use Theorem

8 to conclude that all the integer solutions of this equation are exactly of the form

128 224

k k k

k k

, where Z is an arbitrary integer.

224 128 224 128

a x b a x b n

11. Solving n and 25

b y c

Commentary. The procedure of solving equations a x in integers outlined in

r

the above example can be improved in such a way that apart from the sequence i of

s t i

the successive remainders, one computes also two integer sequences i and such

a s b t r i r a b

i i n

that i , for all . Hence, upon reaching one automatically gets

s t a x b y a b n

the integer solution n to the equation without the need to trace

r q i back through the values of i and to produce the solution (as we did in the above

example). This improved version of the procedure is what is usually called the Extended

a b Euclidean Algorithm. One can show that its running time is Olen len , so (again)

this algorithm is essentially as fast as the gcd algorithm itself, or simply as finding

b the product a . Our method is less effective, but it is conceptually simpler, easier to

remember and is what actually lies at the core of the improved version of the algorithm.

a x b

Now we can finally show how to solve the congruences n and the related

a x b equations n .

Example. (Solving ax ≡n b and a ·n x = b)

x I. Consider the congruence 224 128 64. By the remarks on page 23 preceding

Corollary 1 we know that the solutions to this congruence are just the values

x y

of x as a solution of the equation 224 128 64, considered in the above

f k k g x example. Hence, every x Z is a solution to 224 128 64 (and

there are no other integer solutions to this equation).

x

Using the fact that 224 128 32 we can transform the congruence 224 128

x

64 into an equivalent one: 32 x 128 64, which gives rise to the equation 32 128

k k g

64 in Z128. Its solutions are exactly all the elements of the set f Z

f g

Z128 that is x 10 126 .

II. Consider the equation 37 x in Z44. Since 37 44 , by Corollary 1 item 3

we know that it is possible to solve this equation in Z44. In order to do that, we

y

consider the auxiliary equation 37 x 44 37 44 in Z. We have

r r q

i i i

44 37 37

.

Working from the bottom upwards, we get

37 37 16 37 16 44 37

16 44 19 37.

x

y

0 0

26 Algebra and Number Theory. Lecture Notes Szymon Brzostowski

x y x

Hence, one of the possible solutions to the equation 37 44 is

y

19 16 . By Theorem 8, all the integer solutions of this equation are

44 37

k f k k k

exactly the elements of the set 19 16 Z 19 44

k k g x

16 37 Z . This means that the solutions to the congruence 37 44

k k

are the numbers x 19 44 , where Z, and the solutions to the equation

f k k g f g

37 x in Z44 are the numbers 19 44 Z Z44 25 (so there is only

one solution to this equation!). Check that indeed 37 44 25 .

x -

III. Consider the equation in Z256. Since 256 and , Corollary 1 item

x

1 asserts that the congruence 256 does not have any integer solutions.

x This means that also the equation does not have any integer solutions in Z256.

We can sum up the above considerations by giving the following schematized pro-

cedure of dealing with the various problems to solve:

Congruence Modular equation

a x b a x b n

To solve: n

n a b n a b

Restrictions: Z N Zn N

a n j b a n j b

Solutions exist iff:

x x

Solution domain: Z Zn &

Auxiliary: .

n y b

Auxiliary equation: a x

Solution domain: x y Z

# Extended Euclid to

axnyan x y

A solution to aux.: Z

#

x y

Solution set for the Sol

n

x x k

auxiliary equation and

a n

a

yy k k

(Sol Z ): , Z

a n

. &

n o n o

n n

x k x k k k

Solution: Z Z Zn

a n a n

x + 3 y 4 z = 2

Problem 1. Try to solve the equation 2 in integers.

x + 3 y = 1 Hint: First solve the equation 2 .

12 The group Gn and the ϕ function

In Example II on page 25 we found that the equation 37 44 x has only one solution

in Z44, namely x 25. This is not accidental, as we shall note shortly. First, recall that

p p

p p

for a prime number , Zp is a field with exactly elements (Corollary 2). But n

we also know that even for composite number there are numbers in Zn which are invertible with respect to n (Corollary 1 item 3). Hence, it is natural to consider the

set of the invertible elements of Zn:

G

12. The group n and the function 27

n n G fk k n g n

Notation. For N, , we put n Z . G

The basic theorem on n is the following:

n n G n Theorem 9. (The structure of Gn) For every N, , n is a group.

This group is called the group of multiplicative inverses modulo n.

x

This theorem explains that the solution x 25 to the equation 37 44 is just the

G

inverse element 37 in the group n. By Exercise 3 we know that an inverse element

in a group is always unique; that is why x 25 is the only solution to this equation.

a x b

More generally, by Property 1 item 3 every equation n has a unique solution if

a b G a n b n b n

n, that is if . Actually, the condition is unnecessary

a n for the uniqueness of the solutions; is enough. This is a simple generalization

of Exercise 3 and Property 1 item 3 (you will see this in practice on the classes). G

How many elements do the groups n have?

p G fpg G p p Example. If is a prime number then p . Hence, card („card”

is just the number of elements in case of a finite set). G

We pose the definition of the counting function for the groups n:

Definition 15. (Euler totient function) The Euler totient (or ) function is



G n

card n, if

n

defined on the set of natural numbers as . Hence, N N.

, if n

p p p By the above example, if P. Is there a closed-form formula for ϕ in general? The answer is „none is known”, but there is a method of computation

of the values of which, in principle, is as fast (or rather: as slow) as factorizing integers

(see Theorem 12). Below we provide a picture of the graph of the function in the set

g f 10000 which shows why a direct formula for is unlikely. 28 Algebra and Number Theory. Lecture Notes Szymon Brzostowski

To explain how to compute the values of the function, we introduce the following:

Definition 16. (Multiplicative function) We say that a function f N C is

k l fk l fk fl

multiplicative if f and for every coprime N it is .

gk l

There is a related notion of an additive function which satisfies g and

k l k gl

g for all naturals .

f (x)

: ! 3 x 7! 2 2 Exercise 6. Show that if f N R is additive then N R is multiplicative.

Multiplicative (additive) functions are very common in Number Theory. Here is

the first example of such a function.

k l k l n k n l n

Property 5. If k l n Z and then . In particular, the

n x x n function N N is multiplicative.

Another example of a multiplicative function you might already guess:

Theorem 10. (On the totient function) The Euler function is multiplicative.

n How does the above theorem help with calculating ? In order to understand

this, we recall the following fact which should be more or less familiar to you. Theorem 11. (Fundamental theorem of arithmetic) Every natural number n

can be expressed in the form

r 1

p n p r

, (5)

p p

r P

r r

where , are pairwise different prime numbers and N.

p p r

Moreover, representation is unique up to the order of the factors .

p p r In particular, if we assume that then is unique and is called the

canonical representation of the number n.

(The phrase „pairwise different” is to be understood as „all of them different from each

other”.)

Example. 4356000 11 . This is the canonical representation o 4356000.

Using Theorem 11 we may recast the definition of being coprime as follows: Property 6. Two natural numbers are coprime iff the primes appearing in their canonical representations are all different. In particular, powers of different prime numbers are always coprime.

G

12. The group n and the function 29

Example. By the above property we have e.g. 11 , 11 , 13 11 .

Similarly, Theorem 10 justifies the following calculations:

4356000 11 11 .

It should be clear to you that the above calculations are not accidental. Namely we have:

r 1

p f n p r Corollary 3. If is a multiplicative function and is a decomposition

of a natural number n into pairwise different prime factors then

r 1

fp fn fp r .

Once we have Corollary 3, the only missing information to be able to compute (at

p p

least in principle) the values of the function is how to compute , where P

p p

and N. Right after stating Definition 15 we noticed that and the reason

n fg p for that is just the fact that all the element of the set Zp are coprime to . This in

turn is a consequence of Property 6: clearly in the canonical representation of a natural

p p n

number n less than a given prime there is no this prime , so these numbers and

p p

p are coprime. Similarly, in order to compute for a prime and N, we must

p

count the number of numbers from the set Zp that are coprime to . But by Property

p

6 again, this reduces to counting those numbers from Zp that do not contain in

their canonical representations or, what amounts to the same thing, those that are not

p

divisible by the prime . Now, it is easy to see that all the numbers of the set Zp that

p p p p

are divisible by p are those of the form , so there are exactly

p p

such numbers. All the other elements of the set Zp are not divisible by , hence

p G p p they are coprime to , hence they belong to p , hence there are exactly such

numbers. We can sum up these observations:

1

p G p p p nfk pk

P g

p p Corollary 4. If and N then Z Zp and .

Combining Corollaries 3 and 4 we finally get:

n

Theorem 12. (A formula for ϕ) Let n be a natural number and let

r 1

p p n r

be a decomposition of into pairwise different prime factors. Then

p p

1 r

r r 1 1

n p p p n p

r r

p p

1 r

Example.

10

1. We have 4356000 11 4356000 4356000

11 33

3

11 11 1056000.

11

2. We have 10000 so 10000 10000 4000 (cf. figure (4)).

16 72

3. We have 8687 17 73 so 8687 17 73 16 72 6912. 17 73

30 Algebra and Number Theory. Lecture Notes Szymon Brzostowski

n Commentary. The formula for given in Theorem 12 depends heavily on the

possibility of factorization of the number n into its prime factors. Up to now, there is no

known algorithm that can do this efficiently. The best proven complexity bound is for

1 / 2

C n n

n ln lnln

Oe C

so-called quadratic sieve algorithm and is , where n tends to with n . This is really slow. Since most of modern cryptographic methods depend in essential way on the practical infeasibility of factoring numbers, there was a contest called RSA Factoring Challenge, active from 1991 to 2007, in which the RSA Laboratories founded money prizes for factoring large numbers. This acted as a method of verifying what key lengths should be considered as being safe given current state of knowledge of factorizing algorithms and available computing power. Before the project was canceled, the largest challenge that had been solved was RSA-640, i.e. the number to be factored consisted of 640 bits. It was done in 2005 and it was awarded with $ 20 000. After 2007 some bigger challenges were also factored (for free), the biggest one being RSA-704 (a number having 212 decimal digits). For the remaining challenges see http://en.wikipedia.org/wiki/RSA_Factoring_Challenge. It is interesting to note that there is a quantum computer factoring algorithm by P. Shor which works in polynomial time i.e. fast enough to compromise all cryptographic systems based on the problem of factorization. However, it is uncertain if powerful enough quantum computers will soon be built, so this algorithm provides no real danger for world’s safety ;-) in the near future. Still, there is the possibility that an efficient real-world-computer factoring algorithm might exist. Finding such an algorithm means fame.

13 Fermat’s little theorem and its generalizations

As you have probably already noticed, calculations involving congruences are much simpler to perform than the „usual ones”, especially when we are interested in a remainder of some (possibly quite complex) expression modulo an integer which itself is not too

big. For instance, it turns out that raising an integer to a (large) power in Zn can be realized rapidly by so-called repeated-squaring algorithm (see [Sho08, p. 65]). Moreover, there are some special exponents for which the computation of the power

is trivial. To describe such a situation, we first note the following, somewhat unexpected:

a b

Property 7. If p is a prime number, then for every Z it holds

p p

p

a b a p b

mod . (6)

a b

In particular, if Zp then

p p

p

a b a b p

in Zp

p

x x x

p p

(in the ring Zp the symbol means ).

z

p factors 13.Fermat’slittletheoremanditsgeneralizations 31

The reason behind the above formulas is quite simple. As you may already know (at

least in the special cases of second and third powers) there is something called Binomial

   

P

n

n

n n

i n i

n

a b a b

theorem: , where is the binomial coefficient.

i

i n i

i i

 

n i n n

n

i f ng n

Now, for the integer has its numerator divisible by

i

i n

and the factors in its denominator are less than n. This means that if is a prime, the

n

   factor in the numerator does not cancel out with any of the factors in the denominator

P

n

n n

i n i

a b n

(in fact is co-prime to them) and hence mod . But this gives

i

i i

n n

a b n mod n if is a prime number, so (6) holds.

Examples.

a b a a b b a b a b

1. We have for any Z.

2. More specifically, on one hand 10 64 and on the other 10

1000 216 784. Clearly, 64 784.

The above observation can be exploited even more. Fix a prime number p. Obvi-

p p p

p p

p p

ously, we have p , but thanks to Property 7 we also have

p p p

p p

p , hence p and „so on”. This „so on” means that we notice a rule which allows as to use symbols instead of specific numbers and essentially repeat the above calculations, but in a more abstract way. Namely, let’s say

p

≡ n

we know that n p n for some number N . Then we also have

p p

p

n

n n p p , by (6)

where the second congruence follows from the bolded assumption. Thus, we have found

p

p

n n n n n

p that p for any N . This means we justified the fol-

lowing chain of implications:

p p p

p

n n n n

p p p p Since the very first antecedent in this chain is true, we may infer that all its consequents are also true. In another words, we have used mathematical induction and we have proved the essential part of the following:

Theorem 13. (Fermat’s little theorem) Let p be a prime number.

p

k k p

1. If k Z then mod .

p

p k p

2. If k then mod .

p

k k k p

3. If Zp then in Z .

p

k G k G p 4. If p then in .

Items 2 and 4 of the above theorem are direct consequences of Corollary 1 items 1 and

p

k p

3, respectively: e.g. the congruence k mod can be multiplied sidewise by an

l k l k p element Z such that p (which exists since ) and this gives the item 2. Examples.

28

1. 234 29 since 29 is a prime and 234 29 .

17

2. 12 12 in Z17 since 17 is a prime. 32 Algebra and Number Theory. Lecture Notes Szymon Brzostowski

Fermat’s theorem can be formulated in a somewhat more general form (to see this, it is enough to use the „sidewise multiplication trick”).

Theorem 14. (Second version of Fermat’s little theorem) Let p be a prime. If

m n

mn mn p k k p

k Z and N satisfy mod then mod . In particular,

m n

k k k p

if Zp then in Z .

543

Example. Since 543 36 183 36 , we have 234 37 234 37 12 37 12 33 37 36 11

37 11 37 26.

What about moduli which are not prime? First note an example: so item 2

of Theorem 13 does not hold for the non-prime modulus . This means that we need a

different exponent than just p in general (if such an exponent exists). It turns out that there is a more general way of finding such numbers. And this method comes from

. Namely, we have:

G n

Theorem 15. (Lagrange theorem) If is a finite group with elements and

H H G H

is its (this means that as a set and is a group with

H H j G

respect to the same operation restricted to ) then card card .

H f g H j

Example. The set is a subgroup of Z and card card Z .

At this point, it may not be obvious how Lagrange theorem can help us find the

exponents that we search for. The missing ingredient is the following:

k

G a G hai fa k g

Property 8. If is a finite group then for every the set N

m

hai a

is its subgroup. Denote m card . Then , where is the identity of the

l

l l m a

group G. Moreover, for every N such that , we have .

ai a

The group h of Property 8 is said to be generated by .

G hai fk a k g

Warning. If a group is in additive notation then N ,

a a a H H hi

where k , e.g. the group of the previous example is equal to

z

k addends (check!).

Let us see how the above two facts provide a natural generalization of Fermat’s little

n n k n

theorem. Fix N, , and let Zn be a number coprime to . This means that

k G G G

n n n

n (see page 27 for the definition of ). Since is a group, by Lagrange

hki j G G n n

theorem we have that card card n. But we know that card , where is G

the totient function. Using Property 8 we get (in the group n)

(n)

(n)

n hk i

hk i

k i

card card h card

k k . 13.Fermat’slittletheoremanditsgeneralizations 33

Hence, we have proved the classical:

n

Theorem 16. (Euler’s theorem) Let n N, .

n

k G k G n

1. If n then in .

n

k n k n

2. If k Z and then mod .

a b

k G a b a b n k k G n

3. If n and N satisfy mod then in .

a b

k n a b a b n k k n 4. If k Z, , and N satisfy mod then mod .

233 233

Example. 34567 10 and since 10 we can use the fact that 10

233

to get 233 mod and hence 10 .

Exercise 7. In an interactive Maple session issue the following commands:

Experiment with different values of the parameters.

n

Commentary. In Euler’s theorem, the exponent is a number which works for all

k G

choices of elements n but it does not have to be the smallest such exponent. For

k G f g k G

instance, it is easy to check that for every it holds in while

G

. The least universal exponent for n is given by so-called Carmichael

n G

function . For a specific element of n a much smaller exponent may suffice, e.g.

G k G n

one always has in any n. This smallest exponent for an element is called

hki G k k

n n

its multiplicative order ordn . By Property 8, ord card (in ). From Lagrange

k j n

theorem we know that always ordn . This does not help much, because it turns k

out that finding ordn is in principle as hard as factoring integers (first we should find

n ).

Having prepared all the necessary tools, we may illustrate the theory with a prac- tical application: 34 Algebra and Number Theory. Lecture Notes Szymon Brzostowski

The RSA (Rivest, Shamir, and Adleman) cryptographic system. This is an example of a public key cryptosystem. To use it, Bob must generate a pair of keys – a public and a private one. The public key is available in a secure repository (a public key server) and it may be used by Alice to encrypt her message to Bob. After receiving the encrypted message, Bob may use his private key to decrypt it, obtaining the original message.

Here is how Bob generates his keys. He should:

q p q

1. Generate two (large) random primes p and , (this can be done efficiently).

p q n npq

2. Compute n (the number len is the key length) and .

e G e

n

3. Choose ( does not have to be very big).

d e G

n

4. Compute in the group .

n e n d Then: the public key is the pair and the private key is .

In order to encrypt her message to Bob, Alice should:

m m n

1. Split her massage into chunks i of length less than .

e

m c m

i i n

2. For each chunk (recast to an integer) compute i in Z .

c c c Then: the sequence is the encrypted message.

In order to decrypt the message from Alice, Bob should:

d

c m c

i i n

1. For each chunk compute i in Z .

m m

2. Combine the chunks i into one (reversing Alice’s splitting method).

i m m m m i Then: i and is the original Alice’s message.

Proof of correctness. To see the general idea of how the above method works, assume

m m n m G m

n i i n i

that a chunk i is such that i.e. since for sure Z . Then we have

e d e

d d

m m c m G

i i n i

i (in ).

e d

e d G e d m m

n n i

But in so and by Euler’s theorem item 3 we get i

G m m

i i

in n. This means that and Bob decrypts correctly all the chunks of Alice’s

m n m m

i i

message (the case of i must be treated in another way but still ).

G d e

n Since is an abelian group, the roles played by the numbers and can be interchanged, which gives another application: Bob may publish a file (F) along with so called hash of F but this hash is encrypted by Bob’s private key. Then, this new piece of information may work as a Bob’s digital signature (D). Namely, anybody who downloaded a file that was suppose to be the one published by Bob, may compute the hash of this file and compare it against the value of the signature D „decrypted” using Bob’s public key. If those two values do not match, this means the downloaded file is either a fake and probably does not come from Bob or there was some net error during the transmission.

A safety remark: you should see that in order to be able to „break the system”, it is

e G

n

enough to find Bob’s private key, which in turn amounts to finding the number

e d

n

satisfying . But in order to do it, first of all you must compute the modulus

n n n

somehow. And you know only (part of the public key). If you can factor into

q n

the (secret) prime factors p and then clearly you can compute . Otherwise, how

n to compute ?... 14. Chinese remainder theorem 35

14 Chinese remainder theorem

b

In the previous sections we have learned how to solve congruences of the type a x

mod n and how to apply the modular arithmetic to perform certain calculations

rapidly. However, up to now there may have seemed to be no connection with the usual

calculations in the ring Z . In this section, we indicate such a connection, which may be traced back to the Vth century.

Our goal is to solve the system of congruences

x a n

mod

(7)

n x a r

r mod

x a a r for , in the integers. Here Z are arbitrary. However, it turns out that in

general in order that the system (7) have solutions, we must impose some conditions

n n r

on the natural numbers , as proved by the following:



Example. Consider the system x mod . One sees easily that all the solutions to the

x mod

k k

first congruence are the numbers of the form , where Z, i.e. while the

l l

solutions to the second congruence are just the numbers , Z, i.e. 11

In particular, the first series of numbers consists of even integers and the second

series – of odd integers. Hence, there is no common solution x Z to this system.

n n r What sort of assumptions should be imposed then on the numbers in order

to guarantee the existence of some solutions to (7)? To find a possible such condition, let



x a

us consider a two-congruence system mod m . This can be rewritten in an equiv-

b n x mod

alent form as follows:

x k m a

x l n b

x k l

where k l Z are some integers to be found. If this system has some solution Z l

then upon subtracting its two equations we get that k and must satisfy

m l n a b

k . (8) b But now recall that a and are arbitrary integers. This means that on the right-hand

side of this equation there can stand virtually any integer; in particular, it may happen

b

that a . Thanks to Bezout’s identity (Theorem 8) we know that the equation

kmln mn n

has a solution in integers iff that is iff m and are coprime.

a k m b l n

On the other hand, under this assumption, we have that x if the

l x integers k and satisfy equation (8). This means that this value of satisfies also the

system of congruences that we wished to solve. Moreover, it is not hard to notice that

n all other solutions to this system are congruent to the given one modulo m . Thus, we are able to give all the integer solutions to the system (7) in the case of two congruences.

36 Algebra and Number Theory. Lecture Notes Szymon Brzostowski



x mod

l

Example. Consider the system . Equation (8) becomes k 13 .

x 11 mod13

k l

We easily find that 10 is a solution to this equation. By the above reasoning,

t t t x x 10 13 28 39 , where Z. The solution in Z39 is 11.

In principle, the above method is the idea behind a general algorithm of solving

the system (7). One thing we see at once: if this system (of r congruences) has a x

solution x, then this is obviously a solution of any two congruences of this system,

i j n n j

too. This means that for any it should hold i , or in another words –

n n r the numbers should be pairwise coprime. It turns out that this condition is

enough in order that the system (7) have solutions:

a a n n

r r

Theorem 17. (Chinese remainder theorem) Let Z and N be

n n r

natural numbers greater than . If are pairwise coprime then the system

x a n

mod

x a n r mod r

has solutions in Z. These solutions are given by the formula

s s

1 r

x a a t n n

r r

, (9)

n n

1 r

n n

1 r

t s s n

j j

where Z and the j Z are any numbers satisfying mod ,

n

j

j r

x

n n

for . In particular, there exists Z of this

r exactly one solution 1

system.

n n

1 r

a s j

It is not hard to see why Theorem 17 is true. Namely, every number j is

n

j

n n

1 r

a n s

j j n

congruent to j modulo because of the relation verified by . But since

k n

j

n n

1 r

k j a s k j x

j n

for , we get j for . This means that defined by formula

k n

j

x a a n j r x

j j

(9) satisfies j mod for , so this is a

solution to our system of congruences. Now, let x Z be another solution of this

x x n j r

system. This implies that mod j for every . In another words,

j f rg n j x x n n

r

j . Since are pairwise coprime, this means that also

n n j x x x x n n

r r

so that and differ by an integer multiple of , so that

n

Zn

r there exists a unique solution of our system of congruences in 1 and this solution

x n n r is equal to mod .

Example. Consider the system

x

mod

x

mod .

x mod

s s s

We must solve the congruences 18 mod , 10 mod and 45 mod

s s s s

i

for , , (in each case only one solution Z is enough). Using the methods of Sec-

s s s

tion 11 we find e.g. , and . According to formula (9), all the solutions

t t

to our system are of the form x 18 10 90 90, where

t Z. The unique solution in Z90 is thus equal to . 14. Chinese remainder theorem 37

At the beginning of this section we mentioned that Chinese remainder theorem may work as a connection between the modular arithmetic and the usual one. The main idea behind such an application of Theorem 17 is the following. Suppose you want to com- pute a value of some expression E involving big integers. It is a common situation that you may predict some upper bound for the result. For our discussion, let us assume that

this bound is M; more precisely, assume that the result is an integer lying in the interval

M n n r

. You may choose some sequence of (small) coprime numbers such

n n M n j f rg

E E

r j

that , compute j mod for each and then use Chinese

x

n

Zn r

remainder theorem to recover the unique integer 1 satisfying the system of

n n M x

x E E

n r

congruences j . Since , the only possibility is that . In another j

words, you may compute the value of E by performing most of the calculations modulo n

the small numbers j (which is very fast) and only at the end should you produce the

n n

x E r final (possibly big) result working modulo . While this may seem a lot of work, in practice this approach (when applicable) is usually much more efficient then a direct computation in Z, provided that the numbers involved are big enough. Moreover, if you have a computer with a multi-core processor, you may split the calculations of E

j among several threads which speed things up even more. Another advantage is the memory usage – it may be much smaller for this method than for the direct one. Let us see by example how the above method works (this example also shows that

for small numbers it is more efficient to use a direct computation):

Example. Let us say we want to compute the product k 13 63. Clearly, the result

10

n n n n

will not be bigger than . We may choose e.g. , , , . We

10

n n k k k k have: and mod , mod , mod , mod . So

we must solve the system

x

mod

x mod

.

x

mod

x mod

Because of formula (9), in this case there are only two numbers that should be found:

s s s

mod and mod or equivalently mod and

s s s x

mod . This gives e.g. and . Hence,

t t k t 5859 1260, Z. Since 5859 mod 1260 819, this is the value of .

Commentary. The method of performing calculations with integers by first falling back on modular arithmetic and then applying Chinese remainder theorem is widely used. Some example applications of this method include so-called threshold secret sharing (in cryptography) and also faster decryption in the RSA cryptosystem (typically about three times faster). Also, it can be used to multiply big matrices or polynomials (with integer coefficients). There is another method of speeding up calculations, called rational reconstruc- tion, which, loosely speaking, allows one to use modular arithmetic first and then re- cover the result which can be a rational number (see [Sho08, Section 4.6]).

38 Algebra and Number Theory. Lecture Notes Szymon Brzostowski

n n r It is worth noting that the assumption that the numbers are coprime (see

Theorem 17) is not in general necessary in order that the system (7) have solutions. It

i j f rg a a n n

j i j

may be proved that solutions exist iff i mod . Moreover,

n n

n n r

Z

r in such a case there is exactly one solution of the system in 1 , where

n n r denotes the least common multiple of the numbers .

15 The prime numbers

The most important and difficult problems in Number Theory are connected with prime numbers. They are also important from the point of view of applications e.g. in cryptography. For instance, in order to generate the keys in the RSA crypto- graphic system, you must be able to find (possibly very big) prime numbers. Thus, the first natural question is if this is always possible. The answer to this problem was pro- vided already by Euclid (c. 300 BC):

Theorem 18. (Euclid theorem) There are infinitely many prime numbers.

p p n

Proof. To the contrary, assume that there are only finitely many primes, e.g. .

N p p N p j

n j

We may consider the number . Clearly, mod for every

n N , which means that is not divisible by any prime. This is not possible (why?).

While the above theorem is good news, it does not provide us with any direct way of finding big prime numbers. A fascinating property of primes is that their distribution among the naturals looks as a random one and nobody has found any direct formula

producing only prime numbers (not to mention a formula for, let’s say, k-th prime number). So, a natural approach to generate big primes would be just to test subsequent

numbers starting from some big enough N and hope that soon enough in the sequence

N N N, , ,... there appears a prime number.

Question 1. Is this a reasonable strategy?

Some bad news:

Fact. There are arbitrary long intervals of natural numbers without prime numbers.

There are also good news:

Theorem 19. (Bertrand–Chebyshev theorem) For every integer n there

exists a prime number p such that

n n p .

Example.

n 10 20

p 11 11 13 11 13 11 13 17 11 13 17 19 23 29 31 37

n 10 12 14 16 18 20 40 15. The prime numbers 39

Theorem 19 gives us some positive information, but it turns out one can do much

better. In order to state the next result, we introduce the following important definition:

x x

Definition 17. (π function) Let R. Then the number of primes less than or equal to x. The function is called the prime-counting function.

Example.

x 10 11 10

.

x 78498

A famous theorem about the asymptotic distribution of primes is the following:

Theorem 20. (Prime Number Theorem)

x

lim x .

x

ln x

x x This theorem says that for big x there are „roughly” prime numbers less than . The ln x

assertion of the theorem can be written in a somewhat more suggestive way as follows:

x

x o x as ,

ln x

o

where the small- o notation is just a way of saying that the values hidden by the

x symbol o tend to (in this case as ). A simple corollary of Theorem 20 is the following improvement of the

Bertrand–Chebyshev theorem:

N x N

Corollary 5. For every (small) real there exist N such that for all

x x there is a prime number in the interval .

This corollary does not say anything about the value of the number N but there are also

some specific results in this direction. For instance, there is always a prime between x

x x

and if 396738. These kind of results show that probably it

x 25 ln

should not take „too long” to find a prime number starting from some value of x and following the strategy described on page 38. So the answer to Question 1 is yes. There are also much more „optimistic” predictions concerning the distribution of primes in some specific intervals and these predictions have some heavy empirical evidence. One

such conjecture is Legendre’s conjecture, which states that there is a prime number

n n between n and for every N. This conjecture is still unsolved. Of the same

status is Cramér’s conjecture which states that if p is a prime number then in the

p p O p interval ln there is also another prime. Sometimes it is also important to be able to generate big prime numbers of a special

form. For instance in the RSA system, some additional restrictions may be imposed on q the prime numbers p and (see page 34). This is connected with the notion of a strong prime, which in turn is a contra-answer to existing factorizing algorithms working faster

than usually in some special situations. An example of such an algorithm is Pollard’s

n p q

p algorithm. In order for this method to be useless to factor , one should

p r

k r assure that the prime number is of the form p , where the number is also

a big prime (and similarly for the prime number q). Thus, the question is: do primes of such a form exist and how many of them is there? The answer to this question may be viewed as a generalization of Theorem 20:

40 Algebra and Number Theory. Lecture Notes Szymon Brzostowski l

Theorem 21. (Dirichlet’s theorem) If two positive integers k and are coprime

k k l k l

then there are infinitely many prime numbers in the sequence

x k

that is there are infinitely many prime number solutions to the congruence l .

x x

More precisely, if k l the number of these primes less than or equal to that l

are congruent to k modulo then

x

k l

lim x

l

x

or equivalently ln x

x

x o x

k l as .

l

ln x

k l x x

Note that for and it holds k l and so Dirichlet’s theorem

is indeed a generalization of Theorem 20. Moreover, this theorem guarantees that if you

r r

take a (big) prime number r and consider the sequence , for sure

k r k

therein you will find a prime p for some N. If you generate only this sort

q p

of primes p and for the RSA algorithm, you are safe from the attack.

n n g Example. In the sequence f N there has to be infinitely many primes (by

Dirichlet’s theorem). The first few primes in this set are:

17 41 73 89 97 113 137 193 233 241 257 281 313 337 353 401 409 433 449 457 521

x

l x

In particular, 521 21. On the other hand, evaluated at and 521

x l ln

gives the value 20.820844 so in this case the prediction of Theorem 21 is almost exact. As you can see, the distribution of prime numbers, although chaotic, is also subject to some restrictions and there are many observations concerning the counting function

. Perhaps the most famous of all unsolved problems in mathematics is the Riemann

Hypothesis and this problem can also be stated in terms of the function .

Millennium Problem (Riemann Hypothesis 1859). Prove that

p

x x O x x

Li ln ,

R

x

x dt

where the Eulerian logarithmic integral Li is given by the formula Li . t ln

The Riemann Hypothesis postulates that you may estimate the number of primes

x x

by the number Li (which can be easily computed) and by doing so you make a

p

x x mistake which is O ln ,soa relatively small mistake. This hypothesis turns out to be extremely important in mathematics as it implies many other unsolved problems in Number Theory. That is why it has been chosen as one of Millennium Problems and as such is worth $ 1000 000. The price is offered by Clay Institute (see http:// www.claymath.org/millennium-problems/millennium-prize-problems). Although we answered Question 1 in the affirmative, one more information seems to be crucial in order to be able to generate big primes. Namely, since there are no

formulas for prime numbers, how to check if the number under consideration is a prime

or not? Of course, for a given N N you could employ the naïve method of successive

p

N N

divisions of N by naturals less than (or ) but this approach is totally useless for N big N. Notice also that this method would produce a (partial) factorization of as a by-

product (for a composite N), which is both unnecessary and – as has been said already – widely believed to be impossible to accomplish in a reasonable time. So we must ask: 15. The prime numbers 41

Question 2. Is primality testing fast? The above question is about the existence of an efficient algorithm. In order to find such an algorithm, it is necessary to observe some special features of prime numbers which can be used for the testing procedure. One of such observations was Fermat’s

little theorem. Another one is:

n

Theorem 22. (Wilson’s theorem) For any n N, ,

n n n mod is a prime number.

Example.

n

For we have 5040 so is not a prime.

n For 11 we have 10 628800 11 10 11 so 11 is a prime. Although interesting, Wilson’s theorem is not of practical importance for primality testing (it would result in a very slow algorithm). Still, this theorem shows that there are possible some characterizations of primality in different terms. In practice, the problem of primality testing is solved by probabilistic algorithms, e.g. so-called Miller-Rabin test. Such tests are based on some simple observations concerning prime numbers and give you the correct answer only with some positive probability. In the case of Miller- Rabin test, it is based on some simple corollaries from Fermat’s little theorem. If it returns „composite”, the number is indeed a composite one but if it returns „prime”, you cannot be totally sure that this is indeed the case. However, you may reduce the prob- ability of error as much as you wish and in practice this probabilistic test works very well. And what if you have to be sure that a given number is indeed a prime? For a long time there have been no means of checking this fast enough (similarly as for the factorization problem). Only in 2002 did Agrawal, Kayal, and Saxena give so-called AKS primality test which is a deterministic algorithm and can be used to prove that

a number is a prime. This algorithm is much slower than probabilistic tests (e.g. the

n

running time of Miller-Rabin test is essentially Olength while the fastest known

o

n version of the AKS test requires Olength binary operations, for a given integer

n). Still, the existence of this algorithm is quite amusing and suggests giving second thought to the possibility of existence of a fast algorithm for the factorization problem, doesn’t it? Summing up, we have our answer to Question 2: Yes. The search for primes. By Theorem 18 we know that there are infinitely many prime numbers. Since these numbers are so important in applications, and because people like to compete, there is a money prize for finding big prime numbers. Such prize is offered by Electronic Frontier Foundation (see https://www.eff.org/awards/coop). Cur- rently, a prize of $ 150 000 is offered „to the first individual or group who discovers a prime number with at least 100 000 000 decimal digits”. The previous prize offered by EFF was $ 100 000 and it was awarded in 2009 to the GIMPS project for finding a prime with at least 10 000 000 decimal digits. GIMPS (Great Internet Mersenne

Prime Search) is a large scale distributed computing project aiming at finding so-

p

p called Mersenne primes (these are primes of the form , where is also a prime). Up to now, 48 such primes have been found, the largest one having 17 425 170 decimal digits. Nobody knows if there are infinitely many Mersenne primes. You can join the 42 Algebra and Number Theory. Lecture Notes Szymon Brzostowski

GIMPS project if you wish. If your computer finds a new record 108-digit prime number using GIMPS software, you will have to split the prize with the GIMPS project; also awards of about $ 3 000 are offered by GIMPS itself for finding a new Mersenne prime (see http://www.mersenne.org/legal/#mpa for details). Unsolved problems. Besides the famous Riemann Hypothesis, there are tons of un- solved problems in Number Theory. Some of them can be found at http://en.wikipe- dia.org/wiki/Unsolved_problems_in_mathematics#Number_theory_.28general.29.

16 Systems of linear equations

16.1 From one to several equations Looking back at the previous sections, the following reflection might be in order: we developed our new tools and ideas mostly to be able to solve some equations. In a very general meaning of the word, mathematics is about solving – a proof of a hypothetical

proposition may be treated as a solution to an „equation” of the form: the assumptions

an unknown reasoning the proposition. In this section we are going to learn how to solve another class of equations and systems of equations, so-called linear ones.

Let us begin with some examples.

x y Example. The equation defines a line in R (hence, this is a linear equa-

tion with infinitely many solutions in R ):

y

x

Let us recall that in Section 11 we sought for the integer solutions of this equation, which

in turn corresponds to finding integer points lying on the above line (e.g. the points

and ). In another words, if you had a big enough piece of grid paper, a

long enough ruler and if you were accurate enough, you would be able to solve (in Z)

x b y c a b c

all equations of the form a , where Z with almost no calculations.

x y

It also makes sense to consider the equation from the above example

p p p

over Z , where is a prime number and (of course in this situation means p

p

and means p). Indeed, multiplication by in Z produces an equivalent equation

p x

y

x

p p p

p Z p

p which defines „a line” in . Clearly, the inverses

y

in the last formula are to be understood in Zp but other than that, this formula for is

x

in principle the same as the one over R which reads y . This suggests that

such equations could be considered more abstractly (namely over fields) and that there should be no major differences between solving them over R and over an abstract field K.

16. Systems of linear equations 43

x y

Example. Consider the equation in Z Z Z . We easily find that

y

in Z . Hence and by the above observations, our equation is equivalent to

x x y x

. This means that becomes a function of . The table of

values of this function is as follows:

x .

y

x y

Graphically, the „line” in Z looks like this:

y

x

In general, a linear equation may have many variables and may have coefficients in

any field. Here is the formal definition:

n

Definition 18. (Linear equation) Let K be a field. A linear equation in

x x x a x a x b a a

n n n n

variables over K has the form , where K

n

S

are the coefficients and b K is the constant term. The subset of K built of all

n

S

the solutions to a linear equation is called (if ? K ):

n

a point (if ),

n

a line (if ),

n a plane (if ),

a hyperplane (in general).

p

x y z x y z

Example. π is a linear equation in variables over R. This equation defines a plane in R:

If you intersect several hyperplanes you get either an empty set or another hyper- plane: 44 Algebra and Number Theory. Lecture Notes Szymon Brzostowski

Examples of various configurations of planes in R3. Let’s say we are given two or

b y c z d a b c d three equations of the form a x , where R and let’s say we want to solve them simultaneously. This means that we want to solve a system of linear equations. The set of solutions of this system consists of exactly those points in R that lie in the intersection of all the hyperplanes defined by the equations of the system.

a) Two parallel planes – b) Two planes c) Parallel planes – no no points in common. intersect – a line in points in common to all common. three.

d) Two parallel planes e) Three planes inter- f) Three planes intersect and a third one inter- sect in pairs only – no along a common line. secting these two – no points in common to all points in common to all three. three.

g) Three planes inter- sect – a unique point in common. 16. Systems of linear equations 45

The formal definition of a system of linear equations is:

Definition 19. (A linear system) Let K be a field. A system of linear equa- tions over K (briefly: a linear system) is a finite (non-empty) set of linear equations

over K. Usually such a system is written in the following form:

a x a x b

n n

a x a x b

m m n n m

n n

a b a a a

i n

where i j K.A solution (in K ) of this system is any point K

x

that satisfies all equations of this system (this means that the substitutions

a x a

n n lead to genuine equalities in K). A linear system is called consistent if it has at least one solution and inconsistent in the opposite case. Our objective is to learn how to solve linear systems. We begin with some simple ones...

16.2 Triangular systems and back substitution There are situations when a system is really easy to solve. A general class of such systems is the following: Definition 20. (Triangular system) A linear system is in triangular form and is

called a triangular system if it is of the form

a x a x a x a x b

n n

a x a x a x b

n n

a x a x b

n n

a x b

n n n n

a a n n where are non-zero elements of the field of coefficients K.

Note that a triangular system always has the same number of variables as the number of

x x x

n

equations. Moreover, if we set then every equation of a triangular system x has a different greatest variable j . This variable is called the leading variable of the equation. In another words, one can say that a system is triangular (up to permutation of the equations) if its every variable is a leading variable of exactly one equa- tion of the system. Usually, when no ordering of variables is mentioned, one means the lexical (=dictionary) ordering, or an ordering which is clear from the context.

Example of solving triangular systems using back substitution. Consider the system

w x y

x y z

y z

z

If we set w x y z then we see that every equation has a different leading variable (and in exact this order); hence this is indeed a triangular system. 46 Algebra and Number Theory. Lecture Notes Szymon Brzostowski

In order to solve the system, we start with the simplest equation which is the one

z z involving the smallest variable as the leading variable. We get z . Now we

use the discovered value of z and substitute this value into the other equations. The next

simplest equation that we get in this way is the second from the bottom of the system:

y y

y

z Now we also substitute y into the other equations. The third one from the bottom

of the system becomes:

x x

x

y z

Lastly, we also substitute this value of z into the first equation to get

11

w w

11 w

x y

11

wx y z

Summing up, our system has the unique solution equal to . This

means that this system corresponds to case g) of example on page 44 (although in R).

p p P

Note also that we could consider this system over Zp where and 11. In such

a case, at each step we should solve our equations in this Zp. It is easy to see that for

w

this particular system the solution would be the same except that 11 p where

p p P

is the multiplicative inverse of in Zp. For but our system cannot be p

considered over such Zp unless you reduce its coefficients modulo . If you do so than

various things may happen. For instance, if p then you still have a triangular system

w x y z p

with unique solution (check!) but for you do not have a

w x y z

triangular system any more. In Z the solution is , in Z there are

w x y z t t t

no solutions and in Z the solutions are of the form ,

t where Z (you will learn how to solve such systems later on).

The process of solving triangular systems using the method sketched in the above example is called back substitution. Note that in each case in which we had a triangular system there was only one solution to such system (see the example above). This is not accidental, as it turns out:

Property 9. Every triangular system of equations (over any field K) has exactly one solution in Kn. This solution can be found using back substitution.

16.3 Matrix notation for a linear system

The idea behind solving a general system of linear equations is simple: we will try to gradually transform a given system in such a way that its solution set will not change but the system itself will get simpler and simpler up to a point where it will be easy to read off the solutions (as was the case for triangular systems). In order to be able to do it, we must understand what transformations are allowed for a linear system of equations. Let us see this by example: 16. Systems of linear equations 47

Example. Consider the system

w x y z

E

w x y z E

10

w x y z

E

Here is what we can do:

w x y E

1. We can multiply the first equation by so that goes into

z

E E

10 . This transformation may be denoted by .

w E

2. Next, we can add this new equation to equation to get . This trans-

E E E

formation may be denoted by . 3. We can interchange the second and third rows. This transformation may be

E E denoted by .

After these transformations (applied one after another) our system becomes

z w x y

E

10

w x y z

E

w

E

E E E E E E

Now we can apply the transformations and to get

x y z E

10

x y z E

29

w

E

E E E E E

Using and we get

w

E

x y z E

29 (10)

z y E

14 14 64

This system is quite similar to a triangular one but it is not such because the variable

z is not a leading variable of any equation of this system (a more basic reason for this is that in our system there are less equations than variables so no matter what you do you cannot reduce it to a triangular form). To finish our work, we might apply the method of back substitution, but for this we need a triangular system. This is easy to achieve – we can look at the non-leading vari-

able z in the last equation of the system and treat it as a parameter (in our case varying

s

s over R). So we introduce a new name, e.g. , and we set z . At this point, we may

treat s as a (fixed but arbitrary) constant and rewrite our system in the following form:

w

x y s

29

s 14 y 64 14 48 Algebra and Number Theory. Lecture Notes Szymon Brzostowski

In this way we constructed a triangular system (in the variables w x y), which we can solve by back substitution. Applying this method we get the following solutions (exactly

one for each value of s):

w x y z s s s s , where R . 14

Exercise 8. Reduce the coefficients of the system from the example on page 45 modulo 5, and solve

this reduced system over the field Z5. The above example revealed the allowed transformations on linear systems and also showed that we cannot expect to be able to transform every given system to a triangular one. Before we proceed any further, it will be convenient for us to change our language. Namely, observe that in the examples worked out in this section little role was played by the variables themselves. They are just names, placeholders and what matters most are the coefficients of the systems. This suggest that we should be able to dismiss variables from our notation, but doing so we must nevertheless retain the structure of our system

as to not loose any crucial information. This simple idea leads to the notion of a matrix .

n Definition 21. (Matrix) A matrix of shape m is a rectangular table of numbers

(more generally: of elements from a field K or from some other set) having m

rows and n columns. The elements of a matrix are called its entries and the num- n bers m, are called its dimensions. In the context of matrices with coefficients in a field K, the elements of K are often des-

ignated as scalars.

mn S

Notation. The set of -matrices with coefficients in a set will be denoted

m n

m n

A S i j

by S . For a matrix its entry in the -th row and -th column will

1 6 i 6 m

A A

A i j

be denoted by i j and the matrix itself will be written as or just

1 6 j 6 n

A A

i j if the dimensions are clear from the context. A graphical presentation

of a matrix A looks like this:

A A A A A A

n n

B C

B C

A A A A A A

n n

B C

B C

A

A or .

A

A A A A A A

m m m n m m m n

A a

Sometimes it is also convenient to denote the entries of a matrix by i j. Once we have defined matrices, we may use them to encode systems of linear equa-

tions. Namely, with a system

a x a x b

n n

a x a x b

m m n n m

we may associate its coefficient matrix and its augmented matrix :

a a a a a a b

n n

a a a a a a b

n n

a a a a b a a

m m m n m n m m m coefficient matrix augmented matrix 16. Systems of linear equations 49

The augmented matrix contains all the necessary information about the system and we may effectively work with this matrix to find the solutions of our system. The three basic transformations of linear systems considered in the example on page 47 can be trans- lated into appropriate transformations of rows of matrices associated to such systems:

Definition 22. (Elementary row operations on matrices) Let K be a field. The

n

three types of elementary row operations on matrices from Km are:

R R j

1. Interchanging two different rows. i

R R i

2. Multiplication of a row by a nonzero scalar. i

R R R

j j 3. Addition of a multiple of one row to another one. i Two matrices are said to be row-equivalent when one can be obtained from the

other by some number of (successively applied) elementary row operations. We will

A B

B use the notation A to indicate the fact that the matrices and are row-equiva-

lent. It is not hard to observe what follows:

m n Property 10. Let K be a field, mn N. For matrices in K : 1. Every elementary row operation is reversible by an elementary row opera-

tion of the same type.

n 2. The relation of being row-equivalent is an equivalence relation in Km .

Example. Let us rewrite the transformations performed on the system of the example

on page 47 in the language of elementary row operations on matrices. The system was:

w x y z

z x y

w 10

w x y z

The augmented matrix associated to this system is:

10

Now we shall perform row operations on this matrix mimicking the linear system trans-

formations applied in the aforementioned example:

10 10

10 10

R R R R R R R

1 1 1 2 2 2 3

10 10

R R R R R R

1 3 1 2 3 2

10 14 14 64

29 29 29

R R R R R

2 1 1 3 1 14 14 64 50 Algebra and Number Theory. Lecture Notes Szymon Brzostowski

This last matrix is the augmented matrix of linear system (10), which is an illustration of the fact that working with matrices is equivalent to working with the systems them- selves.

16.4 Row echelon form of a matrix

The matrix we produced as the final one during our calculations in the above example is of a very special form (similarly as the system (10) corresponding to this matrix). Such matrices are said to be in row echelon form (you can also say that the system (10) is in echelon form). The word „echelon” means a stair-like military forma- tion (of ships for example). The formal definition of such matrices allows the possibility

that they contain zero rows (=all entries in these rows are equal to ):

m n Definition 23. (Row echelon form) Let K be a field and A K . When a row

of A is non-zero, its first (from left) non-zero entry is called the leading entry (of

that row). The matrix A is said to be in row echelon form (ref) if: A 1. any zero rows of A are below all non-zero rows of ,

2. for two neighbouring non-zero rows of A, the leading entry in the lower row is further to the right than the leading entry in the higher row.

If moreover: 3. every leading entry of A is equal to ,

4. every column of A containing a leading entry has all its other entries equal

to ,

then the matrix A is said to be in reduced row echelon form (rref).

By the above definition, a matrix in rref is also in ref.

Terminology. Sometimes the leading entries of a matrix are called pivots, their loca- tions in this matrix are called pivot positions and the rows (columns) containing a pivot position are called pivot rows (columns).

Example. Consider

.

The first matrix is in ref and the second one is in rref. Moreover, the first matrix can be transformed into the second one by a sequence of elementary row operations.

The significance of echelon forms is justified by the following fact:

16. Systems of linear equations 51

Theorem 23. (Reduction to echelon forms) Let K be a field. Every matrix A

m n

A)

K is row-equivalent to a matrix in rref, denoted by RREF( and called the

(A) reduced row echelon form of the matrix A. Moreover, the matrix RREF is unique

(does not depend on the sequence of elementary row operations used to transform

(A) A B (A) = (B) the matrix A into RREF ). More precisely, if then RREF RREF . Even more importantly, when an augmented matrix is in echelon form the system corresponding to this matrix is easy to solve:

Example. Consider the matrix

. (11)

This matrix is in ref. In order to solve the system it represents, you may proceed two-

fold. y z I. Write down this system just now. Using variables v w x ordered lexically

we get

v w x y z

x y z

.

z

Now we may use a variant of the back substitution method to solve this system y

(cf. the trick used in the example on page 47). Namely, since w and are not

s y t leading variables of the above system, we treat them as parameters: w ,

and rewrite the system in the following form:

v x z s t

w s

s t

x z t

, where R.

y t

z

At this point we have a system in triangular form with respect to our ordering of the variables and we may solve it by back substitution (do this yourself!). Since there are two parameters necessary in order to describe the set of solutions of this system, we conclude that the solutions define a plane in R.

II. Transform matrix (11) into its rref. In this case

1

R R R

2 3 2

R R

2 2

2

R R R

1 3 1

37

.

R R R

1 2 1

w s y t 52 Algebra and Number Theory. Lecture Notes Szymon Brzostowski

This last matrix is already in rref. Now you may either proceed as before (apply

method I to this matrix) or treat the columns that do not contain leading entries

s y t

as corresponding to parameters (w let’s say) and continue as shown:

37 37

s t s t

t t

and then .

The system corresponding to the last matrix is trivial to solve with respect to

v x z

s y t and (but you have to remember that w , is a part of the solution). The two methods of solving linear systems, outlined in the above example, can be loosely summarized as follows: Gaussian elimination. In order to solve a linear system you should: 1. write down the augmented matrix of the system, 2. transform this matrix into a new matrix being in ref, 3. translate the new matrix into the linear system it represents,

4. treat all non-leading variables as parameters (introduce new names for

x s j that) and move them to the right-hand side of the system, j

5. solve the triangular system you have arrived at using back substitution (re-

x s j member to take into account the equations j defining the parameters). Gauss-Jordan elimination. In order to solve a linear system you should: 1. write down the augmented matrix of the system, 2. transform this matrix into its rref, and then either 3a. repeat steps 3.–5. of Gaussian elimination, or

3b. multiply the columns not containing leading entries by new variables treat-

x s s C C

j j j j ed as parameters (but not the last column!), j

4b. subtract the columns with parameters from the last column, zeroing them

P

C C C C

j n j afterwords, n

5b. read off the solutions from this matrix (remember to include appropriate

x s j equations j defining the parameters).

Warning. In both above methods, the names of the new parameters (variables) must all be different from each other.

Commentary. Another variation of Gauss-Jordan method would be to also add new

x s j rows to the matrix from step 4b, namely those corresponding to the equations j . In this way, you get a triangular system with all the necessary information in step 5b. Note also that in Gauss-Jordan method you may work with columns. This is some- thing new but such operations will reappear later on, in another context. 17. The algebra of matrices 53

17 The algebra of matrices

While we defined matrices just in order to simplify our notation, it turns out that matrices are interesting in its own right. Namely, one may define natural (as it will turn

out later) operations on matrices of matching shapes:

R Definition 24. (Operations on matrices) Let be a ring, not necessarily

commutative.

m n

A a B b A B A B

A B R i j

1. Let , i j , . The sum of matrices and is

n

a matrix of size m given by the formula

1 6 i 6 m

A B a b i j

i j .

1 6 j 6 n

m n n p

A a B b A B

A R B R k l

2. Let , , i j , . The product of matrices

A B

p

and is a matrix of size m given by the formula

1 6 i 6 m

A B c

i l ,

1 6 l 6 p

where

i m l p

c a b a b

i l i n n l

i l , for .

m n

Aa A A

A

A R

3. Let , i j and . The (left) scalar multiple of by

n

is a matrix of size m given by the formula

A a

i j .

A

Similarly one may define the right scalar multiple A of by :

A a

i j . Note that two matrices can be multiplied only if the number of columns of the first

matrix is the same as the number of rows of the second matrix. Moreover, the scalar

A R multiplication does not commute (that is in general A ) unless the ring is itself commutative.

Example.

1. Let

A B

and .

Then

A B

.

2. Let

A B

and .

54 Algebra and Number Theory. Lecture Notes Szymon Brzostowski

Then

A B

.

15

Moreover,

B A

.

18

3. Let

A

and .

Then

A

.

15 BA

The above example shows in particular that even if both products AB and are

BA

defined, then in general AB (cf. item 2). One might guess that this is because BA

the above two products are of different sizes. But when are the sizes of AB and the

m n n p m p

R B R AB R BA

same? Well, if A , then . Since should also be defined,

m m n n

AB R BAR

m

first of all we must have p . But then and . If these sizes are

m m m m

A R B R

n to be the same, we must also have m . This means that and .

Such matrices are called square; more precisely, a matrix A is called square of order

n n n R

n if it is of size . The set of all square matrices of order with entries in a ring

AB MnR AB MnR

n R will be denoted by M . Note that if then also so the

operation of multiplication of square matrices of a given order n is an inner operation

nR AB BA in M . This is nice, but does equal for square matrices? The answer is no:

Example. Let

B

A and .

17. The algebra of matrices 55

Then

BA

AB and .

BA Hence AB .

You may wonder what sort of properties do hold for the matrix operations and

. The following theorem provides a partial answer to such question.

Theorem 24. (General properties of matrix operations) Let R be a commuta-

tive ring. Then:

m n

R

1. is an abelian group. The identity of this group is the zero matrix

m n R

O defined as

O m rows.

z

n columns

m n

R A A

The inverse element of a matrix A is the matrix ,

R

where is the inverse element to the identity element .

m n n p

AR B R

2. The operation is associative in the following sense: if , ,

p r

R

C (that is if the sizes of the matrices match) then

B C A B C

A .

m n

A R

3. The operation is distributive over in the following sense: if ,

n p

R

B C then

A B C A B A C

m n n p

R C R

and also if A B , then

A B C A C B C .

4. Scalar multiplication is compatible with matrix multiplication in the following

m n n p

R B R R

sense: if A , and then

A B A B A B A B A B A B .

Theorem 24 exhibits interesting properties of the matrix operations. While all of them are straightforward to check, the verification of associativity of matrix multiplication is somewhat tedious (but try to do it!). As was the case in earlier sections, such associative property means we can safely skip all the brackets when multiplying matrices (also

for the scalar multiplication). Similarly, we agree (as usually) that has a greater

priority than so that we will skip even more brackets in expressions in which both

A B A C A B A C and are involved, e.g. means . However, as we have seen already, we cannot interchange the order of the factors when multiplying matrices. 56 Algebra and Number Theory. Lecture Notes Szymon Brzostowski

Example of associativity of matrix multiplication. Let

B C

A , and .

We have

AB C

and

BC

A .

AB C A BC Hence, .

Theorem 24 can be strengthen for square matrices:

Theorem 25. (Algebra of square matrices) Let R be a commutative ring. Then

MnR

is a ring with unity. Its multiplicative identity is the identity matrix

Mn R

I n given by

n

I n rows.

z

n columns

Moreover, the condition 4 of Theorem 24 holds, which may be expressed by say-

n R R

ing that M is a (unitary associative non-commutative) algebra over .

M R R R

In the particular case of n the ring is just . This means that if is a

R Mn n field then M is also a field. Hence, a natural question is: is K a field if

and K is a (non-commutative) field? The answer is no:

M

Example. Consider the matrix A R with real entries defined as

A .

M B B

Clearly, A O. On the other hand, for R we have O and

B

A O.

A

This means that A is a zero-divisor (see page 15 for the definition). By Property 3, is

A M not invertible in the ring M R , although O. Hence, R is not a (skew) field. 18. Which matrices are invertible? 57

18 Which matrices are invertible?

Let us begin by setting some commonly used terminology. An invertible matrix (see Definition 9 item 1 for the general notion of an invertible element of a ring) is often called non-singular and a matrix that is not invertible is called non-invertible or singular.

Definition 25. (The set of non-singular matrices) Let K be a field. We define

n f A Mn g

GL K the set of non-singular matrices K .

n n

The set GL K will be called the (of degree over K).

n A n A B n

It is easy to see that if AB GL K then GL K and GL K .

A A AB B A Indeed, this reduces to noting the familiar identities and .

Hence, we get:

n Theorem 26. (The structure of GL(n, K)) GL K is a group.

In principle, the answer to the question in the title of the section is simple: similarly

as was the case with the rings Zn (cf. Commentary on page 23), the invertible elements

n of M K are exactly the non-zero-divisors of this ring. To see why this is indeed the

case, first we interpret the elementary row operations on matrices in a new way.

Mn Definition 26. (Elementary matrix) Let K be a field. A matrix A K is called an elementary matrix if it is the result of a single elementary row operation

performed on the identity matrix In . More specifically: A

is an elementary matrix of the first kind if it is the result of the elementary

R R A E

i j i j

operation on I n; we will write . A

is an elementary matrix of the second kind if it is the result of the elemen-

R R A E

n i i

tary operation i on I (here ); we will write . A

is an elementary matrix of the third kind if it is the result of the elemen-

R R R A E

n j j i j tary operation i on I ; we will write .

Remark. The distinction between elementary matrices of the first, second and third kinds is merely a matter of terminology convenience. Thus, depending on author’s pref- erences, the enumeration of these kinds may vary.

The general form of elementary matrices can be represented graphically as follows:

2 3 2 3

1 ... 0 1 ... 0

6 7 6 7

6 7 6 7

6 7 6 7

6 7 6 7

6 7 6 7

6 7 6 7

0 0 0 0

0 ... 1 0 ... 1

6 7 6 7

6 7 6 7

6 7 6 7

6 7 6 7

i i

th row th row

0 0 ... 0 1 1 0 ... 0 0

6 7 6 7

6 7 6 7

6 7 6 7

6 7 6 7

0 1 ... 0 0 0 1 ... 0 0

6 7 6 7

6 7 6 7

6 7 6 7

6 7 6 7

E E

6 7 6 7

6 7 6 7

i j i j

6 7 6 7

6 7 6 7

0 0 0 0

6 7 6 7

0 0 ... 1 0 0 0 ... 1 0

6 7 6 7

6 7 6 7

6 7 6 7

6 7 6 7

6 7 6 7

1 0 ... 0 0 0 ... 0 1

j j

6 7 6 7

7 6 7

6 th row th row

6 7 6 7

6 7 6 7

1 ... 0 1 ... 0

6 7 6 7

6 7 6 7

4 5 4 5

0 0 0 0

0 ... 1 0 ... 1

First kind Third kind

58 Algebra and Number Theory. Lecture Notes Szymon Brzostowski

1 ... 0

0 0

0 ... 1

1 0 0

E

0 0 i

i th row .

0 0

0 0 1

1 ... 0

0 0

0 ... 1

Second kind

Example. Consider the set M C . The elementary matrix of the first kind corre-

R R

sponding to the elementary operation is

E

,

The elementary matrix of the second kind corresponding to the elementary operation

R R

is

E

and the elementary matrix of the third kind corresponding to the elementary operation

i R R R

is

E i

.

i

The first simple observation concerning elementary matrices is the following:

Property 11. Every elementary matrix is non-singular. Its inverse is an elemen-

tary matrix of the same kind. More specifically,

E E E E E E

i j i j i j i i

i j .

The above property is not accidental, because it is just a restatement of an analogous property of elementary row operations on matrices (cf. Property 10 item 1). This is a consequence of:

Theorem 27. (Elementary operations and elementary matrices) Let K be a

m n

field and let A K . The matrix that is the result of applying an elementary

E A E Mm row operation to the matrix A is equal to the product , where K is the elementary matrix corresponding to this elementary row operation.

Example. Consider the matrix

A

.

18. Which matrices are invertible? 59

R R R A

If we want to perform the row operation on , we may to this by

computing the following product:

E A

.

Using Theorem 27 and Theorem 23 we immediately conclude:

m n

Corollary 6. Let K be a field and let A K . Then there exist elementary matri-

E E Mm r

ces K such that

A E E E A

r RREF r . (12)

Let us observe that equality (12) tells something useful about invertibility of the

A A Mn A

matrix . Namely, if K then it may happen that RREF I n. In such a

B Mn

B E E E B A

r r

case, putting we have that K and I n . At this

E

point we may exploit Property 11. Namely, multiplying this last equality by r from

E E E E A

r r

left we get r . Now multiplying this equality by from right we get

E E A E E E A E E

r r n r r r

I n . Similarly we find that I . Continuing

A E E A B

B A

r

this process we finally arrive at I n . This means that .

A A

On the other hand, if RREF I n then in RREF the last row have to be the

Mn

zero one. But this implies that for any matrix C K having non-zero entries

A

in the last column only (so other columns are zero) it holds C RREF O or –

C E E E A n C

r

in another words – r O. If , clearly we may choose this to

E E r

be different from O. Since by Property 11 the matrices are invertible, we infer

DCE E E D DA

r that also r O. Summing up, we have O and O which

means that A isa(right) zero-divisor and as such is non-invertible (see Property 3). Thus, basically we have proved:

Theorem 28. (First characterization of invertible matrices) Let K be a field.

A Mn A A

A matrix K is invertible iff RREF I n and so iff is a product of

B A B A A B n

elementary matrices. Moreover, iff I n iff I .

Mn In order to see that a matrix A K which is a right zero-divisor is also a left

zero-divisor we will exploit the following tool:

1 6 i 6 m

Aa

m n

Definition 27. (Transpose of a matrix) If i j is a matrix of size

1 6 j 6 n

T T

1 6 j 6 n

A A a

n m

then its transpose is the matrix of size defined as j i .

1 6 i 6 m

In another words, to transpose a matrix means to interchange its rows with its columns, or to flip this matrix about the imaginary „diagonal line” emanating from the upper left corner of the matrix. 60 Algebra and Number Theory. Lecture Notes Szymon Brzostowski

Example. Let

B C A

.

Then

T

A C B

.

The basic properties of the transpose operation are:

Property 12. Let R be a commutative ring with unity. Then:

T T

T

A A

1. 8 (in words: the operation is an involution=self-inverse),

m n

A R

T T

T

A B A B

2. 8 ,

m n

A B R

T

T

8 c A c A

3. ,

m n

A R

c R

T T

T

8 A B B A

4. ,

m n

A R

n p

B R

T

T

n

5. O O I n I ,

T

T

A A A

6. 8 is invertible .

A M n R

All of the above facts are straightforward to check. We remark only that item 6 is a con-

T T

T T

A A A A

n n

sequence of items 4 and 5 because I I n and similarly I

T T

T T

A A A A which means that the inverse matrix to is equal to and this

is what is asserted in item 6.

Mn A

Assume now that we are given a right zero-divisor matrix A K . Hence, T

is non-invertible. This means that also A is non-invertible (otherwise, by item 6 of the

T

T

A

above property, we would have in particular that there exists which would

mean that A also exists). By Theorem 28 and the analysis on page 59 we know that

T T

C A C Mn

A must be a right zero-divisor; so O for some non-zero K . By

T T T T T

T

A C A C C

Property 12 items 4–5 we thus get O O . Clearly, is also

Mn

non-zero and hence A is a left zero-divisor in K .

Mn A A

Similarly, if A K is a left zero-divisor then is non-invertible so RREF A

I n and hence is a right zero-divisor.

We may sum up these observations:

Mn Corollary 7. Let K be a field and A K . The following conditions are equiv- alent:

i. A is non-invertible,

A

ii. RREF I n,

18. Which matrices are invertible? 61

Mn

iii. A is a left zero-divisor in K ,

Mn iv. A is a right zero-divisor in K .

Using the above informations we can indicate a method of finding the inverse matrix

A n A n

A of a matrix GL K . Namely, by Theorem 28, GL K iff

E E E A

r n

r I ,

E E Mn A E E

r r r

for some elementary matrices K . Moreover, then

E A E E E

r r

. But this means that also In . By Theorem 27 the elementary E matri-ces j correspond to elementary row operations. Summing up these observations, we get:

Theorem 29. (First method of finding the inverse matrix) Let K be a field and

Mn A

A K . Then the matrix is invertible iff there exists a sequence of elemen- A

tary row operations transforming into I n. Moreover, if this happens then the

same sequence of elementary row operations performed on the identity matrix I n

produces the matrix A .

A A

n n

Note that if In then necessarily I RREF because I is in rref (cf. Theorem

23). Hence, the above suggests using Gauss-Jordan elimination to find A (if it exists). A

A convenient method to accomplish this goal is to adjoin matrix I n to the matrix and

A

consider the matrix of the form In . If using Gauss-Jordan elimination you manage A

to transform the –part of this matrix to I n then the whole matrix will be of the form

A A A

In

I n . If you find that RREF you may conclude that the matrix is singular.

M

Example. Let A Q be of the form

A .

We adjoin I to this matrix and perform the Gauss-Jordan elimination (actually only its

reduction-to-rref part) on it:

A

In

1

R R R R R

2 2 1 2 2

R R

1 1

2

1

R R

2 2

5

10

A

In .

1

R R R

1 2 1 2

If our calculations are correct, we have found that

10

A .

We may verify our result:

A A

I .

62 Algebra and Number Theory. Lecture Notes Szymon Brzostowski

Thanks to Theorem 28 (second part) we do not have to check the other equality because

it must hold automatically. Thus, we have indeed found the matrix A .

19 Invertibility and determinants

In the previous section we found a characterization of non-singular matrices and we indicated a first method of finding inverse matrices. In this section we will show how to achieve these goals in a different, „formula-based”, way. The basic tool that we need for these purposes is called determinant. Below, we define this concept recursively. This

recursive definition requires also some convenient auxiliary notions:

R

Definition 28. (Determinant of a matrix, cofactors and minors) Let be

Aa Mn R

a commutative ring with unity and let i j . The determinant of the

A A jAj ja j R

matrix , denoted by det or or i j , is an element of defined as follows:

n A a

Aa

1. If , that is if , then det 11 .

n C

Aa C a C a C

i j

n n

2. If then det , where is called the

i j

a M

C M

i j i j i j

cofactor of the entry and is defined as i j , is called the

a nn

minor of the entry i j and is defined as the determinant of the

j A matrix obtained by deleting ith row and th column from the matrix .

Example.

i. It is easy to check that det I n and det O .

a a

A R M R

ii. Let , – a commutative ring with unity. Then

a a

—————

a j a

j

M a a

det det

a j a

and

a j a

j

M a a

det ————— det .

a j a

C M a C a

This gives and . Summing up:

A a C a M

a a a a

det 11 . (13)

A

iii. Let . We have

———————

j j j

j j j

———————

j j j

jAj

j j j

———————

j j j

11 . 13 19. Invertibility and determinants 63

The example above shows how to compute determinants – this can be done using step-by-step reduction of the dimension of the determinant(s) to compute, up to the point where one can apply formula (13). While such a procedure always works (by the very definition of determinant), often various speed-ups are possible. We will see this shortly, but first let us note the following:

Theorem 30. (Laplace expansion) Let R be a commutative ring with unity and

Mn R i j f ng

A . Then for every it holds

A a C a C j

j j n j n j det ( th column expansion)

and

A a C a C i

i i n i n

det i ( th row expansion),

C a k l

where k l is the cofactor of the entry .

A

Example. Let us consider the matrix from example iii) on page 62.

We may compute its determinant using e.g. rd row expansion as follows:

j j

j j

j j

jAj

j j

——————— ———————

j j

.

13

As you can see, the result is the same as before.

An easy conclusion of the possibility of using rows instead of columns in determinant

computation is:

AMnR A

Corollary 8. Let R be a commutative ring with unity and . Then det

T A det . In words: the determinants of a matrix and of its transpose are the same.

A computation speed-up is possible in the following special situations:

Mn

Property 13. Let K be a field and A K . Then:

A

1. if A contains a zero column (row) then det ,

A 2. if A contains two identical columns (rows) then det ,

3. if a column (resp. row) of A is a multiple of another column (resp. row) of

A A then det .

Item 1 above is an immediate consequence of Theorem 30. Items 2 and 3, in turn, follow from a more general fact. Namely, determinants behave nicely with respect to the ele- mentary row (and also column) operations on matrices: 64 Algebra and Number Theory. Lecture Notes Szymon Brzostowski

Theorem 31. (Determinants and elementary operations) Let K be a field and

Mn

let A K .

A A

1. If B is obtained from by interchanging two rows (columns) of , then

A

det B det

A A 2. If B is obtained from by multiplying a row (a column) of by some K

(here can be zero), then

A

det B det A

3. If B is obtained from by adding a multiple of a row (resp. column) of A

A to another row (resp. column) of , then

A det B det The reason why we may also perform elementary column operations in the above the-

orem comes from an easy application of Corollary 8. Hint: an elementary row operation T applied to the matrix A leads to a matrix which when transposed gives the matrix

that is the result of applying elementary column operation to the matrix A.

Example. Using Theorem 31 we may calculate as follows:

.

C C C C C C

2 3 2 1 2 1

Using item 2 of Theorem 31 we immediately conclude:

n

Mn A A Property 14. Let K be a field, A K and K. Then det det .

Theorem 31 gives more information that can be seen at first sight. Namely, when

E n

applied to elementary matrices it says that (cf. Definition 26): det i j det I ,

E E

n i j n det i det I and det detI . Combining these formulas with The-

orem 27, we may restate (the „row part” of) Theorem 31 in the following simple form:

Mn E

Corollary 9. Let K be a field and let A K . For every elementary matrix

Mn E A E A

K it holds det det det .

AE E A E

j

A further consequence of Corollary 9 is that if r RREF , where are

A n fg A

elementary matrices, then det A detRREF for some K . Hence det

A A

detRREF . By the analysis on page 59 we know that either RREF In or

A A

RREF has a zero row. In the first case we get det and in the second case

A A A det . But by Theorem 28 we also know that RREF I n iff is invertible. Hence:

Theorem 32. (Second characterization of invertible matrices) Let K be a field

Mn A A and A K . Then is non-singular iff det .

Remark. Last theorem is the reason why invertible matrices are also called non-sin- gular.

19. Invertibility and determinants 65

Mn

Now let us use informations provided by Corollary 7. Let A B K . Assume

B A B A B

that A or are singular. This means that or are zero-divisors. Hence also is

B A B

a zero-divisor, so A is singular. By Theorem 32 we infer that if det or det

B A B A B

then also det A . In another words, in this case det det det . On the

A B AE E B F F

s

other hand, if both and are non-singular then r and for

E F j

some elementary matrices i . Successively applying Corollary 9 to these expressions B

for A and we get

A B E E F F E E F F

s r s

det det r det det det det

E E F F A B

s det r det det det .

Thus we have proved (cf. also Property 2 item 5):

AB Mn

Theorem 33. (Cauchy formula) Let K be a field and K . Then

A B A B det det det .

In particular,

n

1. det GL K K is a homomorphism of groups,

A n 2. if GL K then det A . det A Thanks to Theorem 32 we already know that the concept of determinant is con- nected to invertibility of matrices. It turns out that this connection is even deeper. To explain this, we need:

Definition 29. (Adjoint of a matrix) Let R be a commutative ring with unity

Mn R A A and A . The adjoint (or adjugate) adj of the matrix is given by the

formula

T

C C n

11

A

adj ,

C C

n n n

C A A i j where i j is the cofactor of the entry of the matrix . Now we can state:

Theorem 34. (Second method of finding the inverse matrix) Let R be a com-

Mn R

mutative ring with unity and A . Then

A A A A A

adj adj det In .

R A Mn R In particular, if det A is invertible in the ring then is invertible in

and

A A adj . (14)

det A

Mn A

Note that for a matrix A K , where K is a field, the condition of det being

A n invertible in K is equivalent to det A so formula (14) holds for all GL K .

Example. Consider

A

.

66 Algebra and Number Theory. Lecture Notes Szymon Brzostowski

C C C

We have , , ,

C C C C

, , ,

C C

, , . Hence,

T

T

A C

adj i j .

From example on page 64 we know that det A . Using Theorem 34, we thus get

A

.

Verification:

I .

20 Linear systems revisited

The theory of matrices that we have developed so far may in turn be applied to

linear systems. First, let us take a look at the following matrix equation:

X B

A (15)

n m n m

Mn R X R B R R

where A and is a commutative ring with unity. If

Mn R

the matrix A is invertible in the ring , then we can multiply equation (15) by

A from left to get

A B

X .

x b

1 6 i 6 n

x b m Aa X B

n n

In particular, let us take . This means that i j , and

1 6 j 6 m

so equation (15) looks as follows

b x a a

n

. (16)

b x a a

n n n n n

Computing the product gives the equation

b a x a x a x

n n

b a x a x a x

n n n n n n 20. Linear systems revisited 67

which holds iff

a x a x a x b

n n

. (17)

a x a x a x b

n n n n n n

This means that matrix equation (16) is equivalent to linear system (17) in the sense

x

X x x

n

that is a solution of (16) iff is a solution of (17). Hence, the above

x n analysis gives:

Theorem 35. (A linear system as matrix equation) Let R be a commuta-

tive ring with unity. Consider system ( 17). Let A be the coefficient matrix of

this system. Then ( 17) can be written as

X b A ,

coefficient column

matrix of constants

x b

X b A Mn R

where and . If is invertible in then the linear system

x b

n n

( 17) has a unique solution which is equal to

A b X .

It turns out that there is also another way of solving equation (16) or – what amounts

to the same thing – of solving system (17). Namely, let us multiply the equation

x b

A

(18)

x b

n n

by adj A from left. By Theorem 34 we thus get

x b

A A

det adj .

x b

n n

Let us take a look at the right hand side of this equation. This is the matrix whose ith

C b C b C b

i i n i n row is equal to . But this is, according to Theorem 30, equal

to ith column expansion of

a b a a a

i i n

det .

a a b a a

n n i n n i n n

z

A

i

f ng

In another words, for every i we get

A x A i det i det .

68 Algebra and Number Theory. Lecture Notes Szymon Brzostowski

Now, assume we are working over a field K. Then we have two cases. If det A then

A i f ng

we get det i , for every , which is a necessary condition for solvability of A

det i

A x i f ng equation (18) in this case. If det then i , for every . Summing det A up, we have:

Theorem 36. (Cramer’s rule) Let K be a field. Consider system ( 17). Let A be

A A

the coefficient matrix of this system. Denote by i the matrix constructed from b

by replacing its ith column with the column of constants of system ( 17).

n

A x x n

1. If det then ( 17) has the unique solution K , where A

det i

x i f ng

i for .

det A

A A j f ng

2. If det and det j for some then ( 17) is inconsistent.

A A i f ng

3. If det and det i for every then ( 17) is either incon- sistent or has infinitely many solutions. (A careful reader might notice that in the discussion preceding Theorem 36 we have not actually justified why in item 3 above there cannot be a unique solution to the system; try to think it over!)

Example.

1. Consider the system:

x y z

x y z

.

x y z

Written as matrix equation, this system takes the form

x

y

.

z

We have det A

A 13 34 25 26 34 75 67. Since det we know at this

point that our system has a unique solution. To find this solution, we compute:

A

det det 13 34 60,

A

det det 22,

A

det det 23 13.

20. Linear systems revisited 69

According to Theorem 36, this gives

A A A

2 3

det 1 60 60 det 22 22 det 13 13

y z

x .

A A

det A 67 67 det 67 67 det 67 67

x y z It is easy to check that this is indeed the solution of our system.

2. Consider the system

x y z

x y z

.

x y z

Its coefficient matrix is

A

.

Since the first and the third rows of this matrix are identical, by Property 13 item

A A

2 we infer that det . On the other hand, one checks easily that det , which by Theorem 36 item 2 means that this system is inconsistent.

3. Consider the system

x y z

x y z

.

x y z

Its coefficient matrix is

A

.

A A

Here, again by Property 13 item 2, we infer that det . Similarly, det i

i A

for , because in every i one of its columns is a multiple of another one

(cf. Property 13 item 3). Clearly, in this case the system is inconsistent because

by subtracting e.g. its first two equations we get .

4. Consider the system

x y z

x y z

.

x y z

As above, its coefficient matrix is

A

.

A A i

We have det and det i for (same reasons as in the previous

example). But in this case we have infinitely many solutions, because the system

x y z reduces to only one equation: – a plane in R .

Examples 3 and 4 show that in general one cannot strengthen item 3 of Theorem 36 (both cases may happen). There are, however, some special cases when more can be said. The first one is: 70 Algebra and Number Theory. Lecture Notes Szymon Brzostowski

Property 15. Using the same notations as in Theorem 36, assume that system

x x A

( 17) has only two variables , (and two equations as well). If det and

A A det det then ( 17) has infinitely many solutions.

For the second one, we need the following definition:

Definition 30. (Homogeneous system) Let R be a commutative ring with unity.

A linear system over R is called homogeneous if its column of constants is zero

that is if the system has the form

a x a x a x

n n

.

a x a x a x

n n n n n

x

Obviously, a homogeneous system always has a solution (the trivial one, that is

x

n ). Hence:

Corollary 10. Using the same notations as in Theorem 36, assume that system

A A i f ng

( 17) is homogeneous. If det and det i for every , then ( 17) has infinitely many solutions.

A more detailed information about the number of solutions of a linear system is provided by so-called Rouché-Capelli theorem. In order to state this result, we need

the notion of rank of a matrix:

m n

A

Definition 31. (Rank) Let K be a field and A K . The rank of , denoted by A rank A, is defined as the number of non-zero rows in any row echelon form of . It is easy to see that the above definition is indeed correct (that is that for any ref form

of A the number of non-zero rows is the same). But even more can be said:

m n

A mn

Property 16. Let K be a field and A K . Then rank min . Moreover,

T

A rank A rank .

We have:

Theorem 37. (Rouché-Capelli) Let K be a field. Consider the system

a x a x a x b

n n

,

a x a x a x b

m m m n n m

a b i

where i j K. Then system ( ) has a solution in K iff the ranks of the coef-

Ajb

ficient matrix A and the augmented matrix are the same; in symbols – iff

Ajb

rank A rank . Moreover, if ( ) does have solutions, then the dimension

A

of the hyperplane of solutions is equal to n rank . In particular, ( ) has

A Ajb a unique solution iff n rank rank . TBC... Further reading 71

Further reading

[Gar07] Garrett, P. . Available at http://www-users.math.umn.edu/~garrett/m /algebra/notes/Whole.pdf. [Goo06] Goodman, F., M. Algebra: Abstract and Concrete. Available at http://homepage.math. uiowa.edu/~goodman/algebrabook.dir/book.2.6.pdf. [Sho08] Shoup, V. A Computational Introduction to Number Theory and Algebra. Available at http://shoup.net/ntb