<<

3.1 Algorithms 202

1. Course Instructional Materials / Enactment 1.1 Teaching/ Instructional Methods and Aids (Content Delivery) The following Instructional methods have been employed during the course: a) Lecture method: The primary mode of instruction was lectures that introduced the students to new concepts, models and their applications. The students were challenged to think about how and why the concepts, models and proof were developed. Students were guided to emulate the thought process of the researcher that led to the particular concept, model or proof technique. The Lecture material was designed such that it was stimulating and thought provoking. The instructor combined lectures with questions to involve students in the learning process and to check their comprehension.

Instructional aids used (Content Delivery)

Chalk or Marker Board, b) Individualized learning: Written assignments help in organization of knowledge, absorption of facts and better preparation of examinations. It emphasizes on individual learner work and the method that helps both teaching and learning processes. c) Group-learning techniques • Case Study • Quiz • Assignments Instructional aids used Chalk or Marker Board,

3.1 Algorithms 203

1.2 Lecture Notes

UNIT-I Mathematical

Statements and notations: A or is a declarative sentence that is either true or but not both. For instance, the following are : “Paris is in France” (true), “London is in Denmark” (false), “2 < 4” (true), “4 = 7 (false)”.

However the following are not propositions: “what is your name?” (this is a question), “do your homework” (this is a command), “this sentence is false” (neither true nor false), “x is an even number” (it depends on what x represents), “Socrates” (it is not even a sentence). The truth or falsehood of a proposition is called its .

Connectives: Connectives are used for making compound propositions. The main ones are the following (p and q represent given propositions):

Name Represented Meaning ¬p “not p” Conjunction p ∧ q “p and q” Disjunction p ∨ q “p or q (or both)” p ⊕ q “either p or q, but not both” Implication p → q “if p then q” Biconditional p ↔ q “p q”

3.1 Algorithms 204 Truth Tables:

Logical identity

Logical identity is an operation on one logical value, typically the value of a proposition that produces a value of true if its operand is true and a value of false if its operand is false.

The for the logical identity operator is as follows:

Logical Identity

p p

T T

F F

Logical negation

Logical negation is an operation on one logical value, typically the value of a proposition that produces a value of true if its operand is false and a value of false if its operand is true.

The truth table for NOT p (also written as ¬p or ~p) is as follows:

Logical Negation

p ¬p

T F

F T

Binary operations

Truth table for all binary logical operators

Here is a truth table giving definitions of all 16 of the possible truth functions of 2 binary variables (P,Q are thus boolean variables):

3.1 Algorithms 205 P Q 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

T T F F F F F F F F T T T T T T T T

T F F F F F T T T T F F F F T T T T

F T F F T T F F T T F F T T F F T T

F F F T F T F T F T F T F T F T F T

where T = true and F = false.

Key: 0, false,

1, NOR, Logical NOR

2, nonimplication

3, ¬p, Negation

4, Material nonimplication

5, ¬q, Negation

6, XOR, Exclusive disjunction

7, NAND, Logical NAND

8, AND,

9, XNOR, If and only if,

10, q, Projection function

11, if/then, Logical implication

12, p, Projection function

13, then/if, Converse implication

14, OR,

15, true,

3.1 Algorithms 206 Logical operators can also be visualized using Venn diagrams.

Logical conjunction

Logical conjunction is an operation on two logical values, typically the values of two propositions, that produces a value of true if both of its operands are true.

The truth table for p AND q (also written as p ∧ q, p & q, or p q) is as follows:

Logical Conjunction

p q p ∧ q

T T T

T F F

F T F

F F F

In ordinary language terms, if both p and q are true, then the conjunction p ∧ q is true. For all other assignments of logical values to p and to q the conjunction p ∧ q is false.

It can also be said that if p, then p ∧ q is q, otherwise p ∧ q is p.

Logical disjunction

Logical disjunction is an operation on two logical values, typically the values of two propositions, that produces a value of true if at least one of its operands is true.

The truth table for p OR q (also written as p ∨ q, p || q, or p + q) is as follows:

3.1 Algorithms 207

Logical Disjunction

p q p ∨ q

T T T

T F T

F T T

F F F

Logical implication

Logical implication and the are both associated with an operation on two logical values, typically the values of two propositions, that produces a value of false just in the singular case the first operand is true and the second operand is false. The truth table associated with the material conditional if p then q (symbolized as p → q) and the logical implication p implies q (symbolized as p ⇒ q) is as follows:

Logical Implication

p q p → q

T T T

T F F

F T T

F F T

Logical equality

Logical equality (also known as biconditional) is an operation on two logical values, typically the values of two propositions, that produces a value of true if both operands are false or both operands are true.The truth table for p XNOR q (also written as p ↔ q ,p = q, or p ≡ q) is as follows:

3.1 Algorithms 208

Logical Equality

p q p ≡ q

T T T

T F F

F T F

F F T

Exclusive disjunction

Exclusive disjunction is an operation on two logical values, typically the values of two propositions, that produces a value of true if one but not both of its operands is true.The truth table for p XOR q (also written as p ⊕ q, or p ≠ q) is as follows:

Exclusive Disjunction

p q p ⊕ q

T T F

T F T

F T T

F F F

Logical NAND

The logical NAND is an operation on two logical values, typically the values of two propositions, that produces a value of false if both of its operands are true. In other words, it produces a value of true if at least one of its operands is false.The truth table for p NAND q (also written as p ↑ q or p | q) is as follows:

3.1 Algorithms 209

Logical NAND

p q p ↑ q

T T F

T F T

F T T

F F T

It is frequently useful to express a logical operation as a compound operation, that is, as an operation that is built up or composed from other operations. Many such compositions are possible, depending on the operations that are taken as basic or "primitive" and the operations that are taken as composite or "derivative".In the case of logical NAND, it is clearly expressible as a compound of NOT and AND.The negation of a conjunction: ¬(p ∧ q), and the disjunction of : (¬p) ∨ (¬q) can be tabulated as follows:

p q p ∧ q ¬(p ∧ q) ¬p ¬q (¬p) ∨ (¬q)

T T T F F F F

T F F T F T T

F T F T T F T

F F F T T T T

Logical NOR

The logical NOR is an operation on two logical values, typically the values of two propositions, that produces a value of true if both of its operands are false. In other words, it produces a value of false if at least one of its operands is true. ↓ is also known as the Peirce arrow after its inventor, , and is a Sole sufficient operator.

The truth table for p NOR q (also written as p ↓ q or p ⊥ q) is as follows:

3.1 Algorithms 210

Logical NOR

p q p ↓ q

T T F

T F F

F T F

F F T

The negation of a disjunction ¬(p ∨ q), and the conjunction of negations (¬p) ∧ (¬q) can be tabulated as follows:

p q p ∨ q ¬(p ∨ q) ¬p ¬q (¬p) ∧ (¬q)

T T T F F F F

T F T F F T F

F T T F T F F

F F F T T T T

Inspection of the tabular derivations for NAND and NOR, under each assignment of logical values to the functional arguments p and q, produces the identical patterns of functional values for ¬(p ∧ q) as for (¬p) ∨ (¬q), and for ¬(p ∨ q) as for (¬p) ∧ (¬q). Thus the first and second expressions in each pair are logically equivalent, and may be substituted for each other in all contexts that pertain solely to their logical values.

This equivalence is one of De Morgan's laws.

The truth value of a compound proposition depends only on the value of its components. Writing F for “false” and T for “true”, we can summarize the meaning of the connectives in the following way:

p q ¬p p ∧ q p ∨ q p ⊕ q p → q p ↔ q T T F T T F T T T F F F T T F F

3.1 Algorithms 211 F T T F T T T F F F T F F F T T

Note that ∨ represents a non-exclusive or, i.e., p ∨ q is true when any ofp, q is true and also when both are true. On the other hand ⊕ represents an exclusive or, i.e., p ⊕ q is true only when exactly one of p and q is true.

Well formed formulas(wff): Not all strings can represent propositions of the predicate logic. Those which produce a proposition when their symbols are interpreted must follow the rules given below, and they are called wffs(well-formed formulas) of the first order predicate logic. Rules for constructing Wffs A predicate name followed by a list of variables such as P(x, y), where P ispredicate name, and x and y are variables, is called an atomic formula.

A well formed formula of predicate calculus is obtained by using the following rules. 1. An atomic formula is a wff. 2. If A is a wff, then 7A is also a wff. .(B), (A → B) and (A D B ٨ If A and B are wffs, then (A V B), (A .3 4. If A is a wff and x is a any variable, then (x)A and ($x)A are wffs. 5. Only those formulas obtained by using (1) to (4) are wffs. Since we will be concerned with only wffs, we shall use the term formulas for wff. We shall follow the same conventions regarding the use of parentheses as was done in the case of statement formulas.

Wffs are constructed using the following rules:

1. True and False are wffs. 2. Each propositional constant (i.e. specific proposition), and each propositional variable (i.e. a variable representing propositions) are wffs. 3. Each atomic formula (i.e. a specific predicate with variables) is a wff. 4. If A, B, and C are wffs, then so are A, (A B), (A B), (A B), and (A B). 5. If x is a variable (representing objects of the universe of discourse), and A is a wff, then so are x A and x A .

For example, "The capital of Virginia is Richmond." is a specific proposition. Hence it is a wff by Rule 2. Let B be a predicate name representing "being blue" and let x be a variable. Then B(x) is an atomic formula meaning "x is blue". Thus it is a wff by Rule 3. above. By applying Rule 5. to B(x), xB(x) is a wff and so is xB(x). Then by applying Rule 4. to them x B(x) x B(x) is seen to be a wff. Similarly if R is a predicate name representing "being round". Then R(x) is an atomic formula. Hence it is a wff. By applying Rule 4 to B(x) and R(x), a wff B(x) R(x) is obtained. In this manner, larger and more complex wffs can be constructed following the rules given above. Note, however, that strings that can not be constructed by using those rules are not wffs.

For example: xB(x)R(x), and B( x ) are NOT wffs, NOR are B( R(x) ), and B( x R(x) ) .

More examples: To express the fact that Tom is taller than John, we can use the atomic formula taller(Tom, John), which is a wff. This wff can also be part of some compound statement such as taller(Tom, John) taller(John, Tom), which is also a wff. If x is a variable representing people in the world, then taller(x,Tom), x taller(x,Tom), x taller(x,Tom), x y

3.1 Algorithms 212 taller(x,y) are all wffs among others. However, taller( x,John) and taller(Tom Mary, Jim), for example, are NOT wffs.

Tautology, Contradiction, Contingency: A proposition is said to be a tautology if its truth value is T for any assignment of truth values to its components. Example: The proposition p ∨ ¬p is a tautology. A proposition is said to be a contradiction if its truth value is F for any assignment of truth values to its components. Example: The proposition p ∧ ¬p is a contradiction. A proposition that is neither a tautology nor a contradiction is called a contingency.

p ¬p p ∨ ¬p p ∧ ¬p T F T F T F T F F T T F F T T F Equivalence Implication: We say that the statements r and s are logically equivalent if their truth tables are identical. For example the truth table of

shows that is equivalent to . It is easily shown that the statements r and s are equivalent if and only if is a tautology.

Predicates Predicative logic: A predicate or propositional function is a statement containing variables. For instance “x + 2 = 7”, “X is American”, “x < y”, “p is a prime number” are predicates. The truth value of the predicate depends on the value assigned to its variables. For instance if we replace x with 1 in the predicate “x + 2 = 7” we obtain “1 + 2 = 7”, which is false, but if we replace it with 5 we get “5 + 2 = 7”, which is true. We represent a predicate by a letter followed by the variables enclosed between parenthesis: P (x), Q(x, y), etc. An example for P (x) is a value of x for which P (x) is true. A counterexample is a value of x for which P (x) is false. So, 5 is an example for “x + 2 = 7”, while 1 is a counterexample. Each variable in a predicate is assumed to belong to a universe (or domain) of discourse, for instance in the predicate “n is an odd integer” ’n’ represents an integer, so the universe of discourse of n is the set of all integers. In “X is American” we may assume that X is a human being, so in this case the universe of discourse is the set of all human beings.

3.1 Algorithms 213 Free & Bound variables:

Let's now turn to a rather important topic: the distinction between free variable s and bound variables.

Have a look at the following formula:

The first occurrence of x is free, whereas the second and third occurrences of x are bound, namely by the first occurrence of the quantifier . The first and second occurrences of the variable y are also bound, namely by the second occurrence of the quantifier .

Informally, the concept of a bound variable can be explained as follows: Recall that quantifications are generally of the form:

or

where may be any variable. Generally, all occurences of this variable within the quantification are bound. But we have to distinguish two cases. Look at the following formula to see why:

1. may occur within another, embedded, quantification or , such as the in in our example. Then we say that it is bound by the quantifier of this embedded quantification (and so on, if there's another embedded quantification over within ). 2.Otherwise, we say that it is bound by the top-level quantifier (like all other occurences of in our example).

Here's a full formal simultaneous definition of free and bound:

1.Any occurrence of any variable is free in any atomic formula. 2.No occurrence of any variable is bound in any atomic formula. 3.If an occurrence of any variable is free in or in , then that same occurrence is free in , , , and . 4. If an occurrence of any variable is bound in or in , then that same occurrence is bound in , , , . Moreover, that same occurrence is bound in and as well, for any choice of variable y. 5.In any formula of the form or (where y can be any variable at all in this case) the occurrence of y that immediately follows the initial quantifier symbol is bound. 6.If an occurrence of a variable x is free in , then that same occurrence is free in and , for any variable y distinct from x. On the other hand, all occurrences of x that are free in , are bound in and in .

If a formula contains no occurrences of free variables we call it a sentence.

Rules of inference:

3.1 Algorithms 214

The two rules of inference are called rules P and T.

Rule P: A premise may be introduced at any point in the derivation.

Rule T: A formula S may be introduced in a derivation if s is tautologically implied by any one or more of the preceding formulas in the derivation.

Before proceeding the actual process of derivation, some important list of implications and equivalences are given in the following tables.

Implications

I1 P٨Q =>P } Simplification

Q<= I2 PQ٨

I3 P=>PVQ } Addition

I4 Q =>PVQ

I5 7P => P→ Q

I6 Q => P→ Q

I7 7(P→Q) =>P

I8 7(P → Q) => 7Q

Q ٨ I9 P, Q => P

I10 7P, PVQ => Q ( disjunctive syllogism)

I11 P, P→ Q => Q ( modus ponens )

I12 7Q, P → Q => 7P (modus tollens )

I13 P → Q, Q → R => P → R ( hypothetical syllogism)

I14 P V Q, P → Q, Q → R => R (dilemma)

Equivalences

E1 77P <=>P

P } Commutative laws ٨ Q <=> Q ٨ E2 P

E3 P V Q <=> Q V P

R) } Associative laws ٨ Q) ٨ R <=> P ٨ (Q ٨ E4 (P

E5 (P V Q) V R <=> PV (Q V R)

R) } Distributive laws ٨ Q) V (P ٨ Q V R) <=> (P) ٨ E6 P

3.1 Algorithms 215

(PVR) ٨ (R) <=> (P V Q ٨ E7 P V (Q

Q) <=> 7P V7Q ٨ E8 7(P

7Q } De Morgan’s laws ٨ E9 7(P V Q) <=>7P

E10 P V P <=> P

P <=> P ٨ E11 P

7P) <=>R ٨ E12 R V (P

P V 7P) <=>R) ٨ E13 R

E14 R V (P V 7P) <=>T

7P) <=>F ٨ P) ٨ E15 R

E16 P → Q <=> 7P V Q

7Q ٨ E17 7 (P→ Q) <=> P

E18 P → Q <=> 7Q → 7P

Q) → R ٨ E19 P → (Q → R) <=> (P

E20 7(PD Q) <=> P D 7Q

(Q → P) ٨ (E21 PDQ <=> (P → Q

(7Q ٨ Q) V (7 P ٨ E22 (PDQ) <=> (P

Example 1.Show that R is logically derived from P → Q, Q → R, and P

Solution. {1} (1) P → Q Rule P {2} (2) P Rule P {1, 2} (3) Q Rule (1), (2) and I11 {4} (4) Q → R Rule P {1, 2, 4} (5) R Rule (3), (4) and I11.

.( Q → S ) ٨ (P → R ) ٨ (Example 2.Show that S V R tautologically implied by ( P V Q

Solution . {1} (1) P V Q Rule P {1} (2) 7P → Q T, (1), E1 and E16 {3} (3) Q → S P {1, 3} (4) 7P → S T, (2), (3), and I13 {1, 3} (5) 7S → P T, (4), E13 and E1 {6} (6) P → R P

3.1 Algorithms 216

{1, 3, 6} (7) 7S → R T, (5), (6), and I13 {1, 3, 6) (8) S V R T, (7), E16 and E1

Example 3. Show that 7Q, P→ Q => 7P

Solution . {1} (1) P → Q Rule P {1} (2) 7P → 7Q T, and E 18 {3} (3) 7Q P {1, 3} (4) 7P T, (2), (3), and I11 .

, P V Q ) is a valid conclusion from the premises PVQ ) ٨ Example 4 .Prove that R Q → R, P → M and 7M.

Solution . {1} (1) P → M P {2} (2) 7M P {1, 2} (3) 7P T, (1), (2), and I12 {4} (4) P V Q P {1, 2 , 4} (5) Q T, (3), (4), and I10. {6} (6) Q → R P {1, 2, 4, 6} (7) R T, (5), (6) and I11 .PVQ) T, (4), (7), and I9) ٨ R (8) {6 ,4 ,2 ,1}

There is a third inference rule, known as rule CP or rule of conditional proof.

Rule CP: If we can derives s from R and a set of premises , then we can derive R → S from the set of premises alone.

Note. 1. Rule CP follows from the equivalence E10 which states that .(R ) → S óP → (R → S ٨ P ) 2. Let P denote the conjunction of the set of premises and let R be any formula The above equivalence states that if R is included as an additional premise and .R then R → S can be derived from the premises P alone ٨ S is derived from P 3. Rule CP is also called the deduction theorem and is generally used if the

3.1 Algorithms 217

conclusion is of the form R → S. In such cases, R is taken as an additional premise and S is derived from the given premises and R.

Example 5 .Show that R → S can be derived from the premises P → (Q → S), 7R V P , and Q.

Solution. {1} (1) 7R V P P {2} (2) R P, assumed premise {1, 2} (3) P T, (1), (2), and I10 {4} (4) P → (Q → S) P {1, 2, 4} (5) Q → S T, (3), (4), and I11 {6} (6) Q P {1, 2, 4, 6} (7) S T, (5), (6), and I11 {1, 4, 6} (8) R → S CP.

Example 6.Show that P → S can be derived from the premises, 7P V Q, 7Q V R, and R → S . Solution.

{1} (1) 7P V Q P {2} (2) P P, assumed premise {1, 2} (3) Q T, (1), (2) and I11 {4} (4) 7Q V R P {1, 2, 4} (5) R T, (3), (4) and I11 {6} (6) R → S P {1, 2, 4, 6} (7) S T, (5), (6) and I11 {2, 7} (8) P → S CP

Example 7. ” If there was a ball game , then traveling was difficult. If they arrived on time, then traveling was not difficult. They arrived on time. Therefore, there was no ball game”. Show that these statements constitute a valid argument.

3.1 Algorithms 218

Solution. Let P: There was a ball game Q: Traveling was difficult. R: They arrived on time.

Given premises are: P → Q, R → 7Q and R conclusion is: 7P

{1} (1) P → Q P {2} (2) R → 7Q P {3} (3) R P {2, 3} (4) 7Q T, (2), (3), and I11 {1, 2, 3} (5) 7P T, (2), (4) and I12

Consistency of premises:

Consistency A set of formulas H1, H2, …, Hm is said to be consistent if their conjunction has the truth value T for some assignment of the truth values to be atomic appearing in H1, H2, …, Hm.

Inconsistency

If for every assignment of the truth values to the atomic variables, at least one of the formulas H1, H2, … Hm is false, so that their conjunction is identically false, then the formulas H1, H2, …, Hm are called inconsistent.

H2٨ A set of formulas H1, H2, …, Hm is inconsistent, if their conjunction implies a contradiction, that is H1٨ 7R ٨ Hm => R ٨ … ,7R is a contradiction and it is necessary and sufficient that H1, H2 ٨ Where R is any formula. Note that R …,Hm are inconsistent the formula.

Indirect method of proof In order to show that a conclusion C follows logically from the premises H1, H2,…, Hm, we assume that C is false and consider 7C as an additional premise. If the new set of premises is inconsistent, so that they imply a Hm being ٨ ….. H2٨ contradiction, then the assumption that 7C is true does not hold simultaneously with H1٨ ,Hm is true. Thus, C follows logically from the premises H1 ٨ ….. H2٨ true. Therefore, C is true whenever H1٨ H2 ….., Hm.

3.1 Algorithms 219

.7Q Q) follows from 7P٨ ٨ Example 8 Show that 7(P

Solution.

Q) as an additional premise and show that this additional premise leads to a We introduce 77 (P٨ contradiction. Q) P assumed premise P٨)77 (1) {1} Q T, (1) and E1 P٨ (2) {1} {1} (3) P T, (2) and I1 7P٨7Q P (4} {1} {4} (5) 7P T, (4) and I1 7P T, (3), (5) and I9 P٨ (6) {4 ,1} 7Q leads Q) and 7P٨ 7P is a contradiction. Thus {1, 4} viz. 77(P٨ Here (6) P٨ .7P ٨ to a contradiction P Example 9Show that the following premises are inconsistent. 1. If Jack misses many classes through illness, then he fails high school. 2. If Jack fails high school, then he is uneducated. 3. If Jack reads a lot of books, then he is not uneducated. 4. Jack misses many classes through illness and reads a lot of books.

Solution. P: Jack misses many classes. Q: Jack fails high school. R: Jack reads a lot of books. S: Jack is uneducated. R The premises are P→ Q, Q → S, R→ 7S and P٨ {1} (1) P→Q P {2} (2) Q→ S P {1, 2} (3) P → S T, (1), (2) and I13 {4} (4) R→ 7S P {4} (5) S → 7R T, (4), and E18 {1, 2, 4} (6) P→7R T, (3), (5) and I13 {1, 2, 4} (7) 7PV7R T, (6) and E16

3.1 Algorithms 220

P٨R) T, (7) and E8)7 (8) {4 ,2 ,1} R P P٨ (9) {9} R) T, (8), (9) and I9 7(P٨ ٨ (R P٨) (10) (9 ,4 ,2 ,1}

The rules above can be summed up in the following table. The "Tautology" column shows how to interpret the notation of a given rule.

Rule of inference Tautology Name

Addition

Simplification

Conjunction

Modus ponens

Modus tollens

Hypothetical syllogism

Disjunctive syllogism

Resolution

Example 1

3.1 Algorithms 221 Let us consider the following assumptions: "If it rains today, then we will not go on a canoe today. If we do not go on a canoe trip today, then we will go on a canoe trip tomorrow. Therefore (Mathematical symbol for "therefore" is ), if it rains today, we will go on a canoe trip tomorrow. To make use of the rules of inference in the above table we let p be the proposition "If it rains today", q be " We will not go on a canoe today" and let r be "We will go on a canoe trip tomorrow". Then this argument is of the form:

Example 2

Let us consider a more complex set of assumptions: "It is not sunny today and it is colder than yesterday". "We will go swimming only if it is sunny", "If we do not go swimming, then we will have a barbecue", and "If we will have a barbecue, then we will be home by sunset" lead to the conclusion "We will be home before sunset." Proof by rules of inference: Let p be the proposition "It is sunny this today", q the proposition "It is colder than yesterday", r the proposition "We will go swimming", s the proposition "We will have a barbecue", and t the proposition "We will be home by sunset". Then the hypotheses become and . Using our intuition we conjecture that the conclusion might be t. Using the Rules of Inference table we can proof the conjecture easily:

Step Reason

1. Hypothesis

2. Simplification using Step 1

3. Hypothesis

4. Modus tollens using Step 2 and 3

5. Hypothesis

6. s Modus ponens using Step 4 and 5

7. Hypothesis

8. t Modus ponens using Step 6 and 7

Proof of contradiction:

The "Proof by Contradiction" is also known as reductio ad absurdum, which is probably Latin for "reduce it to something absurd".

Here's the idea:

3.1 Algorithms 222 1. Assume that a given proposition is untrue. 2. Based on that assumption reach two conclusions that contradict each other.

This is based on a classical formal logic construction known as Modus Tollens: If P implies Q and Q is false, then P is false. In this case, Q is a proposition of the form (R and not R) which is always false. P is the negation of the fact that we are trying to prove and if the negation is not true then the original proposition must have been true. If are not "not stupid" then they are stupid. (I hear that "stupid !" phrase a lot around here.)

Example:

Lets prove that there is no largest prime number (this is the idea of Euclid's original proof). Prime numbers are integers with no exact integer divisors except 1 and themselves.

1. To prove: "There is no largest prime number" by contradiction. 2. Assume: There is a largest prime number, call it p. 3. Consider the number N that is one larger than the product of all of the primes smaller than or equal to p. N=1*2*3*5*7*11...*p + 1. Is it prime? 4. N is at least as big as p+1 and so is larger than p and so, by Step 2, cannot be prime. 5. On the other hand, N has no prime factors between 1 and p because they would all leave a remainder of 1. It has no prime factors larger than p because Step 2 says that there are no primes larger than p. So N has no prime factors and therefore must itself be prime (see note below).

We have reached a contradiction (N is not prime by Step 4, and N is prime by Step 5) and therefore our original assumption that there is a largest prime must be false.

Note: The conclusion in Step 5 makes implicit use of one other important theorem: The Fundamental Theorem of Arithmetic: Every integer can be uniquely represented as the product of primes. So if N had a composite (i.e. non-prime) factor, that factor would itself have prime factors which would also be factors of N.

Automatic Theorem Proving:

Automatic Theorem Proving (ATP) deals with the development of computer programs that show that some statement (the conjecture) is a of a set of statements (the axioms and hypotheses). ATP systems are used in a wide variety of domains. For examples, a mathematician might prove the conjecture that groups of order two are commutative, from the axioms of group theory; a management consultant might formulate axioms that describe how organizations grow and interact, and from those axioms prove that organizational death rates decrease with age; a hardware developer might validate the design of a circuit by proving a conjecture that describes a circuit's performance, given axioms that describe the circuit itself; or a frustrated teenager might formulate the jumbled faces of a Rubik's cube as a conjecture and prove, from axioms that describe legal changes to the cube's configuration, that the cube can be rearranged to the solution state. All of these are tasks that can be performed by an ATP system, given an appropriate formulation of the problem as axioms, hypotheses, and a conjecture.

The language in which the conjecture, hypotheses, and axioms (generically known as formulae) are written is a logic, often classical 1st order logic, but possibly a non-classical logic and possibly a higher order logic. These languages allow a precise formal statement of the necessary information, which can then be manipulated by an ATP system. This formality is the underlying strength of ATP: there is no ambiguity in the statement of the problem, as is often the case when using a natural language such as English. Users have to describe the problem at hand precisely and accurately, and this process in itself can lead to a clearer understanding of the problem domain. This in turn allows the user to formulate their problem appropriately for submission to an ATP system.

3.1 Algorithms 223 The proofs produced by ATP systems describe how and why the conjecture follows from the axioms and hypotheses, in a manner that can be understood and agreed upon by everyone, even other computer programs. The proof output may not only be a convincing argument that the conjecture is a logical consequence of the axioms and hypotheses, it often also describes a process that may be implemented to solve some problem. For example, in the Rubik's cube example mentioned above, the proof would describe the sequence of moves that need to be made in order to solve the puzzle.

ATP systems are enormously powerful computer programs, capable of solving immensely difficult problems. Because of this extreme capability, their application and operation sometimes needs to be guided by an expert in the domain of application, in order to solve problems in a reasonable amount of time. Thus ATP systems, despite the name, are often used by domain experts in an interactive way. The interaction may be at a very detailed level, where the user guides the inferences made by the system, or at a much higher level where the user determines intermediate lemmas to be proved on the way to the proof of a conjecture. There is often a synergetic relationship between ATP system users and the systems themselves:

• The system needs a precise description of the problem written in some logical form, • the user is forced to think carefully about the problem in order to produce an appropriate formulation and hence acquires a deeper understanding of the problem, • the system attempts to solve the problem, • if successful the proof is a useful output, • if unsuccessful the user can provide guidance, or try to prove some intermediate result, or examine the formulae to ensure that the problem is correctly described, • and so the process iterates.

ATP is thus a technology very suited to situations where a clear thinking domain expert can interact with a powerful tool, to solve interesting and deep problems. Potential ATP users need not be concerned that they need to write an ATP system themselves; there are many ATP systems readily available for use.

3.1 Algorithms 224

UNIT-II Sets and Relations and Functions

RELATIONS Introduction

The elements of a set may be related to one another. For example, in the set of natural numbers there is the ‘less than’ relation between the elements. The elements of one set may also be related to the elements another set.

Binary Relation

A binary relation between two sets A and B is a rule R which decides, for any elements, whether a is in relation R to b. If so, we write a R b. If a is not in relation R to b, then we shall write a /R b.

We can also consider a R b as the ordered pair (a, b) in which case we can define a binary relation from A to B as a subset of A X B. This subset is denoted by the relation R.

In general, any set of ordered pairs defines a binary relation.

For example, the relation of father to his child is F = {(a, b) / a is the father of b} In this relation F, the first member is the name of the father and the second is the name of the child. The definition of relation permits any set of ordered pairs to define a relation.

3.1 Algorithms 225

For example, the set S given by S = {(1, 2), (3, a), (b, a) ,(b, Joe)} Definition The domain D of a binary relation S is the set of all first elements of the ordered pairs in the relation.(i.e) D(S)= {a / $ b for which (a, b) Є S} The range R of a binary relation S is the set of all second elements of the ordered pairs in the relation.(i.e) R(S) = {b / $ a for which (a, b) Є S}

For example For the relation S = {(1, 2), (3, a), (b, a) ,(b, Joe)} D(S) = {1, 3, b, b} and R(S) = {2, a, a, Joe} Let X and Y be any two sets. A subset of the Cartesian product X * Y defines a relation, say C. For any such relation C, we have D( C ) Í X and R( C) Í Y, and the relation C is said to from X to Y. If Y = X, then C is said to be a relation form X to X. In such case, c is called a relation in X. Thus any relation in X is a subset of X * X . The set X * X is called a universal relation in X, while the empty set which is also a subset of X * X is called a void relation in X.

For example Let L denote the relation “less than or equal to” and D denote the relation “divides” where x D y means “ x divides y”. Both L and D are defined on the set {1, 2, 3, 4} L = {(1, 1), (1, 2), (1, 3), (1, 4), (2, 2), (2, 3), (2, 4), (3, 3), (3, 4), (4, 4)} D = {(1, 1), (1, 2), (1, 3), (1, 4), (2, 2), (2, 4), (3, 3), (4, 4)} L Ç D = {(1, 1), (1, 2), (1, 3), (1, 4), (2, 2), (2, 4), (3, 3), (4, 4)} = D

Properties of Binary Relations:

3.1 Algorithms 226

Definition: A binary relation R in a set X is reflexive if, for every x Є X, x R x, That is (x, x) Є R, or R is reflexive in X ó (x) (x Є X ® x R x). For example

• The relation £ is reflexive in the set of real numbers.

• The set inclusion is reflexive in the family of all subsets of a universal set.

• The relation equality of set is also reflexive.

• The relation is parallel in the set lines in a plane.

• The relation of similarity in the set of triangles in a plane is reflexive.

Definition: A relation R in a set X is symmetric if for every x and y in X, whenever {x R y ® y R x ٨ y Є X ٨ x R y, then y R x.(i.e) R is symmetric in X ó (x) (y) (x Є X For example

• The relation equality of set is symmetric.

• The relation of similarity in the set of triangles in a plane is symmetric.

• The relation of being a sister is not symmetric in the set of all people.

• However, in the set females it is symmetric.

Definition: A relation R in a set X is transitive if, for every x, y, and z are in X, (y R z® x R z x R y٨ ٨ z Є X y Є X٨ whenever x R y and y R z , then x R z. (i.e) R is transitive in X ó (x) (y) (z) (x Є X٨ For example

• The relations <, £, >, ³ and = are transitive in the set of real numbers

• The relations Í, Ì, Ê, É and equality are also transitive in the family of sets.

• The relation of similarity in the set of triangles in a plane is transitive.

Definition: A relation R in a set X is irreflexive if, for every x Є X , (x, x)ÏX.

For example

3.1 Algorithms 227

• The relation < is irreflexive in the set of all real numbers.

• The relation proper inclusion is irreflexive in the set of all nonempty subsets of a universal set.

• Let X = {1, 2, 3} and S = {(1, 1), (1, 2), (3, 2), (2, 3), (3, 3)} is neither irreflexive nor reflexive.

Definition:A relation R in a set x is anti symmetric if , for every x and y in X, whenever x R y and y R x, then x = y. (y R x ® x = y ٨ x R y ٨ y Є X ٨ Symbolically,(x) (y) (x Є X

For example

• The relations £ , ³ and = are anti symmetric

• The relation Í is anti symmetric in set of subsets.

• The relation “divides” is anti symmetric in set of real numbers.

• Consider the relation “is a son of” on the male children in a family.Evidently the relation is not symmetric, transitive and reflexive.

• The relation “ is a divisor of “ is reflexive and transitive but not symmetric on the set of natural numbers.

• Consider the set H of all human beings. Let r be a relation “ is married to “ R is symmetric.

• Let I be the set of integers. R on I is defined as a R b if a – b is an even number.R is an reflexive, symmetric and transitive.

Equivalence Relation:

Definition:A relation R in a set A is called an equivalence relation if

▪ a R a for every i.e. R is reflexive

▪ a R b => b R a for every a, b Є A i.e. R is symmetric

▪ a R b and b R c => a R c for every a, b, c Є A, i.e. R is transitive.

For example

• The relation equality of numbers on set of real numbers.

• The relation being parallel on a set of lines in a plane.

Problem1: Let us consider the set T of triangles in a plane. Let us define a relation R in T as R= {(a, b) / (a, b Є T and a is similar to b}

3.1 Algorithms 228

We have to show that relation R is an equivalence relation Solution :

• A triangle a is similar to itself. a R a

• If the triangle a is similar to the triangle b, then triangle b is similar to the triangle a then a R b => b R a

• If a is similar to b and b is similar to c, then a is similar to c (i.e) a R b and b R c => a R c.

Hence R is an equivalence relation.

Problem 2: Let x = {1, 2, 3, … 7} and R = {(x, y) / x – y is divisible by 3} Show that R is an equivalence relation.

Solution: For any a Є X, a- a is divisible by 3, Hence a R a, R is reflexive For any a, b Є X, if a – b is divisible by 3, then b – a is also divisible by 3, R is symmetric. For any a, b, c Є, if a R b and b R c, then a – b is divisible by 3 and b–c is divisible by 3. So that (a – b) + (b – c) is also divisible by 3, hence a – c is also divisible by 3. Thus R is transitive. Hence R is equivalence.

Problem3 Let Z be the set of all integers. Let m be a fixed integer. Two integers a and b are said to be congruent modulo m if and only if m divides a-b, in which case we write a º b (mod m). This relation is called the relation of congruence modulo m and we can show that is an equivalence relation.

Solution :

• a - a=0 and m divides a – a (i.e) a R a, (a, a) Є R, R is reflexive .

• a R b = m divides a-b m divides b - a b º a (mod m) b R a that is R is symmetric.

3.1 Algorithms 229

• a R b and b R c => a ºb (mod m) and bº c (mod m)

o m divides a – b and m divides b-c

o a – b = km and b – c = lm for some k ,l Є z

o (a – b) + (b – c) = km + lm

o a – c = (k +l) m

o aº c (mod m)

o a R c

o R is transitive

Hence the congruence relation is an equivalence relation.

Equivalence Classes:

Let R be an equivalence relation on a set A. For any a ЄA, the equivalence class generated by a is the set of all elements b Є A such a R b and is denoted [a]. It is also called the R – equivalence class and denoted by a Є A. i.e., [a] = {b Є A / b R a}

Let Z be the set of integer and R be the relation called “congruence modulo 3” defined by R = {(x, y)/ xÎ Z Ù yÎZ Ù (x-y) is divisible by 3} Then the equivalence classes are [0] = {… -6, -3, 0, 3, 6, …} [1] = {…, -5, -2, 1, 4, 7, …} [2] = {…, -4, -1, 2, 5, 8, …} Composition of binary relations:

Definition:Let R be a relation from X to Y and S be a relation from Y to Z. Then the relation R o S is given by R o S = {(x, z) / xÎX Ù z Î Z Ù y Î Y such that (x, y) Î R Ù (y, z) Î S)} is called the composite relation of R and S. The operation of obtaining R o S is called the composition of relations.

Example: Let R = {(1, 2), (3, 4), (2, 2)} and S = {(4, 2), (2, 5), (3, 1),(1,3)} Then R o S = {(1, 5), (3, 2), (2, 5)} and S o R = {(4, 2), (3, 2), (1, 4)}

3.1 Algorithms 230

It is to be noted that R o S ≠ S o R. Also Ro(S o T) = (R o S) o T = R o S o T

Note: We write R o R as R2; R o R o R as R3 and so on.

Definition Let R be a relation from X to Y, a relation R from Y to X is called the converse of R, where the ordered pairs of Ř are obtained by interchanging the numbers in each of the ordered pairs of R. This means for x Î X and y Î Y, that x R y ó y Ř x. Then the relation Ř is given by R = {(x, y) / (y, x) Î R} is called the converse of R Example: Let R = {(1, 2),(3, 4),(2, 2)} Then Ř = {(2, 1),(4, 3),(2, 2)}

Note: If R is an equivalence relation, then Ř is also an equivalence relation.

Definition Let X be any finite set and R be a relation in X. The relation R+ = R U R2 U R3…in X. is called the transitive closure of R in X

Example: Let R = {(a, b), (b, c), (c, a)}. Now R2 = R o R = {(a, c), (b, a), (c, b)} R3 = R2 o R = {(a, a), (b, b), (c, c)} R4 = R3 o R = {(a, b), (b, c), (c, a)} = R R5= R3o R2 = R2 and so on.

Thus, R+ = R U R2 U R3 U R4 U… = R U R2 U R3. ={(a, b),(b, c),(c, a),(a, c),(b, a),(c ,b),(a, b),(b, b),(c, c)}

We see that R+ is a transitive relation containing R. In fact, it is the smallest transitive relation containing R.

Partial Ordering Relations:

3.1 Algorithms 231

Definition A binary relation R in a set P is called partial order relation or partial ordering in P iff R is reflexive, anti symmetric, and transitive. A partial order relation is denoted by the symbol £., If £ is a partial ordering on P, then the ordered pair (P, £) is called a partially ordered set or a poset.

• Let R be the set of real numbers. The relation “less than or equal to ” or

o , is a partial ordering on R.

• Let X be a set and r(X) be its power set. The relation subset, Í on X is partial ordering.

• Let Sn be the set of divisors of n. The relation D means “divides” on Sn ,is partial ordering on Sn .

In a partially ordered set (P, £) , an element y Î P is said to cover an element x Î P if x

If x < y but y does not cover x, then x and y are not connected directly by a single line.However, they are connected through one or more elements of P.

Hasse Diagram:

A Hasse diagram is a digraph for a poset which does not have loops and arcs implied by the transitivity. Example 10: For the relation {< a, a >, < a, b >, < a, c >, < b, b >, < b, c >, < c, c >} on set {a, b,c}, the Hasse diagram has the arcs {< a, b >, < b, c >} as shown below.

3.1 Algorithms 232

Ex: Let A be a given finite set and r(A) its power set. Let Í be the subset relation on the elements of r(A). Draw Hasse diagram of (r(A), Í) for A = {a, b, c}

Functions:

Introduction A function is a special type of relation. It may be considered as a relation in which each element of the domain belongs to only one ordered pair in the relation. Thus a function from A to B is a subset of A X B having the property that for each a ЄA, there is one and only one b Є B such that (a, b) Î G.

3.1 Algorithms 233

Definition Let A and B be any two sets. A relation f from A to B is called a function if for every a Є A there is a unique b Є B such that (a, b) Є f .

Note that the definition of function requires that a relation must satisfy two additional conditions in order to qualify as a function.

The first condition is that every a Є A must be related to some b Є B, (i.e) the domain of f must be A and not b, c) Є f => b = c) ٨ merely subset of A. The second requirement of uniqueness can be expressed as (a, b) Є f Intuitively, a function from a set A to a set B is a rule which assigns to every element of A, a unique element of B. If a ЄA, then the unique element of B assigned to a under f is denoted by f (a).The usual notation for a function f from A to B is f: A® B defined by a ® f (a) where a Є A, f(a) is called the image of a under f and a is called pre image of f(a).

• Let X = Y = R and f(x) = x2 + 2. Df = R and Rf Í R.

• Let X be the set of all statements in logic and let Y = {True, False}.

A mapping f: X®Y is a function.

• A program written in high level language is mapped into a machine language by a compiler. Similarly, the output from a compiler is a function of its input.

• Let X = Y = R and f(x) = x2 is a function from X ® Y,and g(x2) = x is not a function from X ® Y.

A mapping f: A ® B is called one-to-one (injective or 1 –1) if distinct elements of A are mapped into distinct elements of B. (i.e) f is one-to-one if a1 = a2 => f (a1) = f(a2) or equivalently f(a1) ¹ f(a2) => a1 ¹ a2 For example, f: N ® N given by f(x) = x is 1-1 where N is the set of a natural numbers. A mapping f: A® B is called onto (surjective) if for every b Є B there is an a Є A such that f (a) = B. i.e. if every element of B has a pre-image in A. Otherwise it is called into.

For example, f: Z®Z given by f(x) = x + 1 is an onto mapping. A mapping is both 1-1 and onto is called bijective . For example f: R®R given by f(x) = X + 1 is bijective.

3.1 Algorithms 234

Definition: A mapping f: R® b is called a constant mapping if, for all aÎA, f (a) = b, a fixed element. For example f: Z®Z given by f(x) = 0, for all x ÎZ is a constant mapping.

Definition A mapping f: A®A is called the identity mapping of A if f (a) = a, for all aÎA. Usually it is denoted by IA or simply I.

Composition of functions:

If f: A®B and g: B®C are two functions, then the composition of functions f and g, denoted by g o f, is the function is given by g o f : A®C and is given by {g(b) = c ٨ bÎ B ': f(a)= b$ ٨ c Є C ٨ g o f = {(a, c) / a Є A and (g of)(a) = ((f(a))

Example 1: Consider the sets A = {1, 2, 3},B={a, b} and C = {x, y}. Let f: A® B be defined by f (1) = a ; f(2) = b and f(3)=b and Let g: B® C be defined by g(a) = x and g(b) = y (i.e) f = {(1, a), (2, b), (3, b)} and g = {(a, x), (b, y)}. Then g o f: A®C is defined by (g of) (1) = g (f(1)) = g(a) = x (g o f) (2) = g (f(2)) = g(b) = y (g o f) (3) = g (f(3)) = g(b) = y i.e., g o f = {(1, x), (2, y),(3, y)}

If f: A® A and g: A®A, where A= {1, 2, 3}, are given by f = {(1, 2), (2, 3), (3, 1)} and g = {(1, 3), (2, 2), (3, 1)} Then g of = {(1, 2), (2, 1), (3, 3)}, fog= {(1, 1), (2, 3), (3, 2)} f of = {(1, 3), (2, 1), (3, 2)} and gog= {(1, 1), (2, 2), (3, 3)}

3.1 Algorithms 235

Example 2: Let f(x) = x+2, g(x) = x – 2 and h(x) = 3x for x Î R, where R is the set of real numbers. Then f o f = {(x, x+4)/xÎ R} f o g = {(x, x)/ x Î X} g o f = {(x, x)/ xÎ X} g o g = {(x, x-4)/x Î X} h o g = {(x,3x-6)/ x Î X} h o f = {(x, 3x+6)/ x Î X}

Inverse functions: Let f: A® B be a one-to-one and onto mapping. Then, its inverse, denoted by f -1 is given by f -1 = {(b, a) / (a, b) Î f} Clearly f-1: B® A is one-to-one and onto.

Also we observe that f o f -1 = IB and f -1o f = IA. If f -1 exists then f is called invertible.

For example:Let f: R ®R be defined by f(x) = x + 2 Then f -1: R® R is defined by f -1(x) = x - 2

Theorem: Let f: X ®Y and g: Y ® Z be two one to one and onto functions. Then gof is also one to one and onto function.

Proof Let f:X ® Y g : Y ® Z be two one to one and onto functions. Let x1, x2 Î X

• g o f (x1) = g o f(x2),

• g (f(x1)) = g(f(x2)),

• g(x1) = g(x2) since [f is 1-1] x1 = x2 since [ g is 1-1} so that gof is 1-1.

3.1 Algorithms 236

By the definition of composition, gof : X ® Z is a function. We have to prove that every element of z Î Z an image element for some x Î X under gof. Since g is onto $ y ÎY ': g(y) = z and f is onto from X to Y, $ x ÎX ': f(x) = y. Now, gof (x) = g ( f ( x)) = g(y) [since f(x) = y] = z [since g(y) = z] which shows that gof is onto.

Theorem (g o f) -1 = f -1 o g -1 (i.e) the inverse of a composite function can be expressed in terms of the composition of the inverses in the reverse order.

Proof. f: A ® B is one to one and onto. g: B ® C is one to one and onto. gof: A ® C is also one to one and onto. Þ (gof) -1: C ® A is one to one and onto. Let a Î A, then there exists an element b Î b such that f (a) = b Þ a = f-1 (b). Now b Î B Þ there exists an element c Î C such that g (b) = c Þ b = g -1(c). Then (gof)(a) = g[f(a)] = g(b) = c Þ a = (gof) -1(c). …….(1) (f -1 o g-1) (c) = f -1(g -1 (c)) = f -1(b) = a Þ a = (f -1 o g -1)( c ) ….(2) Combining (1) and (2), we have (gof) -1 = f -1 o g -1

Theorem: If f: A ® B is an invertible mapping , then f o f -1 = I B and f-1 o f = IA Proof: f is invertible, then f -1 is defined by f(a) = b ó f-1(b) = a where a Î A and bÎ B . Now we have to prove that f of -1 = IB . Let bÎ B and f -1(b) = a, a Î A

3.1 Algorithms 237

then fof-1(b) = f(f-1(b)) = f(a) = b therefore f o f -1 (b) = b " b Î B => f o f -1 = IB Now f -1 o f(a) = f -1 (f(a)) = f -1 (b) = a therefore f -1 o f(a) = a " a Î A => f -1 o f = IA. Hence the theorem.

UNIT-III

ALGORITHMS,MATHEMATICAL INDUCTION AND RECURSION

AGORITHMS

An algorithm is a finite sequence of precise instructions for performing a computation or for solving a problem.

The term algorithm is a corruption of the name al-Khowarizmi, a mathematician of the ninth century, whose book on Hindu numerals is the basis of modern decimal notation. Originally, the word algorism was used for the rules for performing arithmetic using decimal notation. Algorism evolved into the word algorithm by the eighteenth century. With the growing interest in computing machines, the concept of an algorithm was given a more general meaning, to include all definite procedures for solving problems, not just the procedures for performing arithmetic. (We will discuss algorithms for performing arithmetic with integers in Chapter 4.) In this book, we will discuss algorithms that solve a wide variety of problems. In this sec- tion we will use the problem of finding the largest integer in a finite sequence of integers to illustrate the concept of an algorithm and the properties algorithms have. Also, we will describe algorithms for locating a particular element in a finite set. In subsequent sections, procedures for finding the greatest common divisor of two integers, for finding the shortest path between two points in a network, for multiplying matrices, and so on, will be discussed. EXAMPLE 1 Describe an algorithm for finding the maximum (largest) value in a finite sequence of integers.

Extra EXAMPLE 1 Describe an algorithm for finding the maximum (largest) value in a finite sequence of integers Solution of Example 1: We perform the following steps.

1. Set the temporary maximum equal to the first integer in the sequence. (The temporary maximum will be the largest integer examined at any stage of the procedure.) 2. Compare the next integer in the sequence to the temporary maximum, and if it is larger than the temporary maximum, set the temporary maximum equal to this integer. 3. Repeat the previous step if there are more integers in the sequence. 4. Stop when there are no integers left in the

3.1 Algorithms 238 s gest integer in the sequence. ◂ e q u e n c e .

T h e t e m p o r a r y m a x i m u m a t t h i s p o i n t i s t h e l a r

3.1 Algorithms 239

ALGORITHM 1 Finding the Maximum Element in a Finite Sequence. procedure max(a1, a2, … , an: integers)max := a1 for i := 2 to n if max < ai then max := ai return max{max is the largest element}

This algorithm first assigns the initial term of the sequence, a1, to the variable max. The “for” loop is used to successively examine terms of the sequence. If a term is greater than the current value of max, it is assigned to be the new value of max. The algorithm terminates after all terms have been examined. The value of max on termination is the maximum element in the sequence. To gain insight into how an algorithm works it is useful to construct a trace that shows its steps when given specific input. For instance, a trace of Algorithm 1 with input 8, 4, 11, 3, 10 begins with the algorithm setting max to 8, the value of the initial term. It then compares 4, the second term, with 8, the current value of max. Because 4 ≤ 8, max is unchanged. Next, the algorithm compares the third term, 11, with 8, the current value of max. Because 8 < 11, max is set equal to 11. The algorithm then compares 3, the fourth term, and 11, the current value of max. Because 3≤ 11, max is unchanged. Finally, the algorithm compares 10, the first term, and 11, the current value of max. As 10 ≤ 11, max remains unchanged. Because there are five terms, we have n = 5. So after examining 10, the last term, the algorithm terminates, with max = 11. When it terminates, the algorithms reports that 11 is the largest term in the sequence.

PROPERTIES OF ALGORITHMS There are several properties that algorithms generally share. They are useful to keep in mind when algorithms are described. These properties are: ▶ Input. An algorithm has input values from a specified set. ▶ Output. From each set of input values an algorithm produces output values from a spec- ified set. The output values are the solution to the problem. ▶ Definiteness. The steps of an algorithm must be defined precisely.

▶ Correctness. An algorithm should produce the correct output values for each set of input values. ▶ Finiteness. An algorithm should produce the desired output after a finite (but perhaps large) number of steps for any input in the set. ▶ Effectiveness. It must be possible to perform each step of an algorithm exactly and in a finite amount of time.

3.1 Algorithms 240

▶ Generality. The procedure should be applicable for all problems of the desired form, not just for a particular set of input values.

EXAMPLE 2 Show that Algorithm 1 for finding the maximum element in a finite sequence of integers has all the properties listed.

Solution: The input to Algorithm 1 is a sequence of integers. The output is the largest integer in the sequence. Each step of the algorithm is precisely defined, because only assignments, a finite loop, and conditional statements occur. To show that the algorithm is correct, we must show that when the algorithm terminates, the value of the variable max equals the maximum of the terms of the sequence. To see this, note that the initial value of max is the first term of the sequence; as successive terms of the sequence are examined, max is updated to the value of a term if the term exceeds the maximum of the terms previously examined. This (informal) argument shows that when all the terms have been examined, max equals the value of the largest term. (A rigorous proof of this requires the use of mathematical induction, a proof technique developed in Section 5.1.) The algorithm uses a finite number of steps, because it terminates after all the integers in the sequence have been examined. The algorithm can be carried out in a finite amount of time because each step is either a comparison or an assignment, there are a finite number of these steps, and each of these two operations takes a finite amount of time. Finally, Algorithm 1 is general, because it can be used to find the maximum of any finite sequence of integers. ◂

Searching Algorithms

The problem of locating an element in an ordered list occurs in many contexts. For instance, a program that checks the spelling of words searches for them in a dictionary, which is just an ordered list of words. Problems of this kind are called searching problems. We will discuss several algorithms for searching in this section. We will study the number of steps used by each of these algorithms in Section 3.3. The general searching problem can be described as follows: Locate an element x in a list of distinct elements a1, a2, … , an, or determine that it is not in the list. The solution to this search problem is the location of the term in the list that equals x (that is, i is the solution if x = ai) and is 0 if x is not in the list.

THE LINEAR SEARCH The first algorithm that we will present is called the linear , or s e quential search, algorithm. The linear search algorithm begins by comparing x and a1. seaWh ercnhx = a1, the solution is the location of a , namely, 1. When x ≠ a , compare x with a . If1 x = a2, the solution is the location of a2, namely, 2. When x a2, compare x with a3. Continue ≠ Links this process, comparing x successively with each term of the list until a match1 is found, where2 the solution is the location of that term, unless no match occurs. If the entire list has been searched without locating x, the solution is 0. The pseudocode for the linear search algorithm is displayed as Algorithm 2.

ALGORITHM 2 The Linear Search Algorithm. procedure linear search(x: integer, a1, a2, … , an: distinct integers)i := 1 while (i ≤ n and x ≠ := + 1 ai) i i if i ≤ n then location := ielse location := 0return location{location is the subscript of the term that equals x, or is0 if x is not found}

3.1 Algorithms 241

THE BINARY SEARCH We will now consider another searching algorithm. This algorithm can be used when the list has terms occurring in order of increasing size (for instance: if the terms are numbers, they are listed from smallest to largest; if they are words, they are listed Links in lexicographic, or alphabetic, order). This second searching algorithm is called the binary search algorithm. It proceeds by comparing the element to be located to the middle term of the list. The list is then split into two smaller sublists of the same size, or where one of these smaller lists has one fewer term than the other. The search continues by restricting the search to the appropriate sublist based on the comparison of the element to be located and the middle term. In Section 3.3, it will be shown that the binary search algorithm is much more efficient than the linear search algorithm. Example 3 demonstrates how a binary search works. EXAMPLE 3 To search for 19 in the list

1235678 10 12 13 15 16 18 19 20 22,

first split this list, which has 16 terms, into two smaller lists with eight terms each, namely,

1235678 10 12 13 15 16 18 19 20 22.

Then, compare 19 and the largest term in the first list. Because 10 < 19, the search for 19 can be restricted to the list containing the 9th through the 16th terms of the original list. Next, split this list, which has eight terms, into the two smaller lists of four terms each, namely,

12 13 15 16 18 19 20 22.

Because 16 < 19 (comparing 19 with the largest term of the first list) the search is restricted to the second of these lists, which contains the 13th through the 16th terms of the original list. The list 18 19 20 22 is split into two lists, namely,

18 19 20 22.

Because 19 is not greater than the largest term of the first of these two lists, which is also 19, the search is restricted to the first list: 18 19, which contains the 13th and 14th terms of the original list. Next, this list of two terms is split into two lists of one term each: 18 and 19. Because 18 < 19, the search is restricted to the second list: the list containing the 14th term of the list, which is 19. Now that the search has been narrowed down to one term, a comparison is made, and 19 is located as the 14th term in the original list. ◂

We now specify the steps of the binary search algorithm. To search for the integer x in the list a1, a2, … , an, where a1 < a2 < ⋯ < an, begin by comparing x with the middle term am of the list, where m = ⌊ ( n + 1)∕2⌋ . (Recall that ⌊ x⌋ is the greatest integer not exceeding x.) If x > am, the search for x is restricted to the second half of the list, which is am+1, am+2, … , an. If x is not greater than am, the search for x is restricted to the first half of the list, which is a1, a2, … , am. The search has now been restricted to a list with no more than ⌈ n ∕2⌉ elements. (Recall that ⌈x⌉ is the smallest integer greater than or equal to x.) Using the same procedure, compare x to the middle term of the restricted list. Then restrict the search to the first or second half of the list. Repeat this process until a list with one term is obtained. Then determine whether this term is x. Pseudocode for the binary search algorithm is displayed as Algorithm 3.

3.1 Algorithms 242

ALGORITHM 3 The Binary Search Algorithm. procedure binary search (x: integer, a1, a2, … , an: increasing integers)i := 1{i is left endpoint of search intervalj := n {j is} right endpoint of search intervalwhile i }< = j m : (i + j) ∕2⌋ if x >⌊ am then i := m + 1 else j := if x =m a i then location := ielse location := 0return location{location is the subscript i of the term ai equal to x, or0 if x is not found}

Algorithm 3 proceeds by successively narrowing down the part of the sequence being searched. At any given stage only the terms from ai to aj are under consideration. In other words, i and j are the smallest and largest subscripts of the remaining terms, respectively. Al- gorithm 3 continues narrowing the part of the sequence being searched until only one term of the sequence remains. When this is done, a comparison is made to see whether this term equals . x

Sorting Suppose that we have a list of elements of a set. Furthermore, suppose that we have a way to order elements of the set. (The notion of ordering elements of sets will be discussed in detail in Section 9.6.) Sorting is putting these elements into a list in which the elements are in increasing order. For instance, sorting the list 7, 2, 1, 4, 5, 9 produces the list 1, 2, 4, 5, 7, 9. Sorting the list d, h, c, a, f (using alphabetical order) produces the list a, c, d, f, h. There are many reasons why sorting algorithms interest computer scientists and mathemati- cians. Among these reasons are that some algorithms are easier to implement, some algorithms are more efficient (either in general, or when given input with certain characteristics, such as lists slightly out of order), some algorithms take advantage of particular computer architec- tures, and some algorithms are particularly clever. In this section we will introduce two sorting algorithms, the bubble sort and the insertion sort. Two other sorting algorithms, the selection

3.1 Algorithms 243

sort and the binary insertion sort, are introduced in the exercises, and the shaker sort is in- troduced in the Supplementary Exercises. In Section 5.4 we will discuss the merge sort and introduce the quick sort in the exercises in that section; the tournament sort is introduced in the exercise set in Section 11.2. We cover sorting algorithms both because sorting is an important problem and because these algorithms can serve as examples for many important concepts.

THE BUBBLE SORT The bubble sort is one of the simplest sorting algorithms, but not one of the most efficient. It puts a list into increasing order by successively comparing adjacent Links elements, interchanging them if they are in the wrong order. To carry out the bubble sort, we perform the basic operation, that is, interchanging a larger element with a smaller one following it, starting at the beginning of the list, for a full pass. We iterate this procedure until the sort is complete. Pseudocode for the bubble sort is given as Algorithm 4. We can imagine the elements in the list placed in a column. In the bubble sort, the smaller elements “bubble” to the top as they are interchanged with larger elements. The larger elements “sink” to the bottom. This is illustrated in Example 4.

EXAMPLE 4 Use the bubble sort to put 3, 2, 4, 1, 5 into increasing order.

Solution: The steps of this algorithm are illustrated in Figure 1. Begin by comparing the first two elements, 3 and 2. Because 3 > 2, interchange 3 and 2, producing the list 2, 3, 4, 1, 5. Because 3 < 4, continue by comparing 4 and 1. Because 4 > 1, interchange 1 and 4, producing the list 2, 3, 1, 4, 5. Because 4 < 5, the first pass is complete. The first pass guarantees that the largest element, 5, is in the correct position. The second pass begins by comparing 2 and 3. Because these are in the correct order, 3 and 1 are compared. Because 3 > 1, these numbers are interchanged, producing 2, 1, 3, 4, 5. Because 3 < 4, these numbers are in the correct order. It is not necessary to do any more comparisons for this pass because 5 is already in the correct position. The second pass guarantees that the two largest elements, 4 and 5, are in their correct positions. The third pass begins by comparing 2 and 1. These are interchanged because 2 > 1, produc- ing 1, 2, 3, 4, 5. Because 2 < 3, these two elements are in the correct order. It is not necessary to do any more comparisons for this pass because 4 and 5 are already in the correct posi- tions. The third pass guarantees that the three largest elements, 3, 4, and 5, are in their correct positions. The fourth pass consists of one comparison, namely, the comparison of 1 and 2. Because 1 < 2, these elements are in the correct order. This completes the bubble sort. ◂

First pass 3 2 2 2 Second pass 2 2 2 2 3 3 3 3 3 1 4 4 4 1 1 1 3 1 1 1 4 4 4 4 5 5 5 5 5 5 5

Third pass 2 1 Fourth pass 1 : an interchange 1 2 2 3 3 3 4 4 4 : pair in correct order 5 5 5 numbers in color

guaranteed to be in correct order

FIGURE 1 The steps of a bubble sort.

3.1 Algorithms 244

ALGORITHM 4 The Bubble Sort. procedure bubblesort(a1, … , an : real numbers with n ≥for 2) i := 1 to n − 1 for j := 1 to n − i if aj > aj+1 then interchange aj and {a1, … , an is in increasing aj+1 order}

THE INSERTION SORT The insertion sort is a simple sorting algorithm, but it is usually not the most efficient. To sort a list with n elements, the insertion sort begins with the second element. The insertion sort compares this second element with the first element and inserts it Links before the first element if it does not exceed the first element and after the first element if it exceeds the first element. At this point, the first two elements are in the correct order. The third element is then compared with the first element, and if it is larger than the first element, it is compared with the second element; it is inserted into the correct position among the first three elements. In general, in the jth step of the insertion sort, the jth element of the list is inserted into the correct position in the list of the previously sorted j − 1 elements. To insert the jth element in the list, a linear search technique is used (see Exercise 45); the jth element is successively compared with the already sorted j − 1 elements at the start of the list until the first element that is not less than this element is found or until it has been compared with all j − 1 elements; the jth element is inserted in the correct position so that the first j elements are sorted. The algorithm continues until the last element is placed in the correct position relative to the al- ready sorted list of the first n − 1 elements. The insertion sort is described in pseudocode in Algorithm 5. EXAMPLE 5 Use the insertion sort to put the elements of the list 3, 2, 4, 1, 5 in increasing order.

Solution: The insertion sort first compares 2 and 3. Because 3 > 2, it places 2 in the first position, producing the list 2, 3, 4, 1, 5 (the sorted part of the list is shown in color). At this point, 2 and 3 are in the correct order. Next, it inserts the third element, 4, into the already sorted part of the list by making the comparisons 4 > 2 and 4 > 3. Because 4 > 3, 4 remains in the third position. At this point, the list is 2, 3, 4, 1, 5 and we know that the ordering of the first three elements is correct. Next, we find the correct place for the fourth element, 1, among the already sorted elements, 2, 3, 4. Because 1 < 2, we obtain the list 1, 2, 3, 4, 5. Finally, we insert 5 into the correct position by successively comparing it to 1, 2, 3, and 4. Because 5 > 4, it stays at the end of the list, producing the correct order for the entire list. ◂

ALGORITHM 5 The Insertion Sort. procedure insertion sort(a1, a2, … , an: real numbers with n ≥for 2) j := 2 to n i := 1 while aj > i := i + 1 ai m := aj for k := 0 to j − i − 1

aj−k := aj−k−1 ai := m

{a1, … , an is in increasing order}

3.1 Algorithms 245

String Matching

Although searching and sorting are the most commonly encountered problems in computer science, many other problems arise frequently. One of these problems asks where a particular string of characters P, called the pattern, occurs, if it does, within another string T, called the text. For instance, we can ask whether the pattern 101 can be found within the string 11001011. By inspection we can see that the pattern 101 occurs within the text 11001011 at a shift of four characters, because 101 is the string formed by the fifth, sixth, and seventh characters of the text. On the other hand, the pattern 111 does not occur within the text 110110001101. Finding where a pattern occurs in a text string is called string matching. String matching plays an essential role in a wide variety of applications, including text editing, spam filters, sys- tems that look for attacks in a computer network, search engines, plagiarism detection, bioinfor- matics, and many other important applications. For example, in text editing, the string matching problem arises whenever we need to find all occurrences of a string so that we can replace this string with a different string. Search engines look for matching of search keywords with words on web pages. Many problems in bioinformatics arise in the study of DNA molecules, which are made up of four bases: thymine (T), adenine (A), cytosine (C), and guanine (G). The process of DNA sequencing is the determination of the order of the four bases in DNA. This leads to string matching problems involving strings made up from the four letters T, A, C, and G. For instance, we can ask whether the pattern CAG occurs in the text CATCACAGAGA. The answer is yes, because it occurs with a shift of five characters. Solving questions about the genome requires the use of efficient algorithms for string matching, especially because a string representing a human genome is about 3 × 109 characters long. the We will now describe a brute force algorithm, Algorithm 6, for string matching, called naive string matcher. The input to this algorithm is the pattern we wish to match, P = p1p2 … pm, and the text, T = t1t2 … tn. When this pattern begins at position s + 1 in the text T, we say that P occurs with shift s in T, that is, when ts+1 = p1, ts+2 = p2, … , ts+m = pm. To find all valid shifts, the naive string matcher runs through all possible shifts s from s = 0 to s = n − m, checking whether s is a valid shift. In Figure 2, we display the operation of Algorithm 6 when it is used to search for the pattern P = eye in the text T = eceyeye.

ALGORITHM 6 Naive String Matcher. procedure string match (n, m: positive integers, m ≤ n, t1, t2, … , tn, p1, p2, … , pm: characters)for s := 0 to n − m j := 1 while ( j ≤ m and ts+j = := + 1 pj) j j if j > m then print “s is a valid shift”

e c e y e y e e c e y e y e s = 0 s e y e = 1

e c e y e y e e c e y e y e

s = 3 s = 4 e y e e y e

The steps of the naive string matcher with P = eye in T = eceyeye. FIGURE 2 Matches e c e y e y e s = 2 e y e e y e

3.1 Algorithms 246 are identified with a solid line and mismatches with a jagged line. The algorithm finds two valid shifts, s = 2 and s = 4.

3.1 Algorithms 247

Many other string matching algorithms have been developed besides the naive string matcher. These algorithms use a surprisingly wide variety of approaches to make them more effi- cient than the naive string matcher. To learn more about these algorithms, consult [CoLeRiSt09], as well as books on algorithms in bioinformatics.

Greedy Algorithms Many algorithms we will study in this book are designed to solve optimization problems. The goal of such problems is to find a solution to the given problem that either minimizes or maximizes the value of some parameter. Optimization problems studied later in this text “Greed is good ... Greed is right, greed include finding a route between two cities with least total mileage, determining a way to encode works. messages using the fewest bits possible, and finding a set of fiber links between network nodes using the least amount of fiber. Greed clarifies ...” – Surprisingly, one of the simplest approaches often leads to a solution of an optimization spoken by the character Gordon problem. This approach selects the best choice at each step, instead of considering all sequences Gecko in the film Wall of steps that may lead to an optimal solution. Algorithms that make what seems to be the “best” Street. choice at each step are called greedy algorithms. Once we know that a greedy algorithm finds a feasible solution, we need to determine whether it has found an optimal solution. (Note that we call the algorithm “greedy” whether or not it finds an optimal solution.) To do this, we either prove that the solution is optimal or we show that there is a counterexample where the Links algorithm yields a nonoptimal solution. To make these concepts more concrete, we will consider the cashier’s algorithm that makes change using coins. (This algorithm is called the cashier’s algorithm because cashiers often used this algorithm for making change in the days before cash You have to prove that registers became electronic.) a greedy algorithm always finds an optimal solution.

EXAMPLE 6 Consider the problem of making n cents change with quarters, dimes, nickels, and pennies, and using the least total number of coins. We can devise a greedy algorithm for making change for n cents by making a locally optimal choice at each step; that is, at each step we choose the coin of the largest denomination possible to add to the pile of change without exceeding n cents. For example, to make change for 67 cents, we first select a quarter (leaving 42 cents). We next select a second quarter (leaving 17 cents), followed by a dime (leaving 7 cents), followed by a nickel (leaving 2 cents), followed by a penny (leaving 1 cent), followed by a penny. ◂

We display the cashier’s algorithm for n cents, using any set of denominations of coins, as Demo Algorithm 7.

ALGORITHM 7 Cashier’s Algorithm.

procedure change(c1, c2, … , cr: values of denominations of coins, wherec1 > c2 > ⋯ > cr; n: a positive forintege i := 1 r)to r di := 0 {di counts the coins of denomination ci whileused} n ≥ i c di := di + 1 {add a coin of denomination := − i cni } n c {di is the number of coins of denomination ci in the change for i = 1, 2, … , r} We have described the cashier’s algorithm, a greedy algorithm for making change, using any finite set of coins with denominations c1, c2, … , cr. In the particular case where the four denominations are quarters, dimes, nickels, and pennies, we have c1 = 25, c2 = 10, c3 = 5, and c4 = 1. For this case, we will show that this algorithm leads to an optimal solution in the sense

3.1 Algorithms 248

that it uses the fewest coins possible. Before we embark on our proof, we show that there are sets of coins for which the cashier’s algorithm (Algorithm 7) does not necessarily produce change using the fewest coins possible. For example, if we have only quarters, dimes, and pennies (and no nickels) to use, the cashier’s algorithm would make change for 30 cents using six coins—a quarter and five pennies—whereas we could have used three coins, namely, three dimes. If n is a positive integer, then n cents in change using quarters, dimes, nickels, and LEMMA 1 penniesusing the fewest coins possible has at most two dimes, at most one nickel, at most fournies, pen and- cannot have two dimes and a nickel. The amount of change in dimes, nickels,pennies andcannot exceed 24 cents.

Proof: We use a proof by contradiction. We will show that if we had more than the specified number of coins of each type, we could replace them using fewer coins that have the same value. We note that if we had three dimes we could replace them with a quarter and a nickel, if we had two nickels we could replace them with a dime, if we had five pennies we could replace them with a nickel, and if we had two dimes and a nickel we could replace them with a quarter. Because we can have at most two dimes, one nickel, and four pennies, but we cannot have two dimes and a nickel, it follows that 24 cents is the most money we can have in dimes, nickels, and pennies when we make change using the fewest number of coins for n cents.

THEOREM 1 The cashier’s algorithm (Algorithm 7) always makes changes using the fewest coins possiblewhen change is made from quarters, dimes, nickels, and pennies.

Proof: We will use a proof by contradiction. Suppose that there is a positive integer n such that there is a way to make change for n cents using quarters, dimes, nickels, and pennies that uses fewer coins than the greedy algorithm finds. We first note that q′, the number of quarters used in this optimal way to make change for n cents, must be the same as q, the number of quarters used by the greedy algorithm. To show this, first note that the greedy algorithm uses the most quarters possible, so q′ ≤q. However, it is also the case that q′ cannot be less than q. If it were, we would need to make up at least 25 cents from dimes, nickels, and pennies in this optimal way to make change. But this is impossible by Lemma 1. Because there must be the same number of quarters in the two ways to make change, the value of the dimes, nickels, and pennies in these two ways must be the same, and these coins are worth no more than 24 cents. There must be the same number of dimes, because the greedy algorithm used the most dimes possible and by Lemma 1, when change is made using the fewest coins possible, at most one nickel and at most four pennies are used, so that the most dimes possible are also used in the optimal way to make change. Similarly, we have the same number of nickels and, finally, the same number of pennies.

A greedy algorithm makes the best choice at each step according to a specified criterion. The next example shows that it can be difficult to determine which of many possible criteria to choose. EXAMPLE 7 Suppose we have a group of proposed talks with preset start and end times. Devise a greedy algorithm to schedule as many of these talks as possible in a lecture hall, under the assumptions that once a talk starts, it continues until it ends, no two talks can proceed at the same time, and a talk can begin at the same time another one ends. Assume that talk j begins at time sj (where s stands for start) and ends at time ej (where e stands for end).

Solution: To use a greedy algorithm to schedule the most talks, that is, an optimal schedule, we need to decide how to choose which talk to add at each step. There are many criteria we could

3.1 Algorithms 249

use to select a talk at each step, where we chose from the talks that do not overlap talks already selected. For example, we could add talks in order of earliest start time, we could add talks in order of shortest time, we could add talks in order of earliest finish time, or we could use some other criterion. We now consider these possible criteria. Suppose we add the talk that starts earliest among the talks compatible with those already selected. We can construct a counterexample to see that the resulting algorithm does not always produce an optimal schedule. For instance, suppose that we have three talks: Talk 1 starts at 8 A.M. and ends at 12 noon, Talk 2 starts at 9 A.M. and ends at 10 A.M., and Talk 3 starts at 11 A.M. and ends at 12 noon. We first select the Talk 1 because it starts earliest. But once we have selected Talk 1 we cannot select either Talk 2 or Talk 3 because both overlap Talk 1. Hence, this greedy algorithm selects only one talk. This is not optimal because we could schedule Talk 2 and Talk 3, which do not overlap. Now suppose we add the talk that is shortest among the talks that do not overlap any of those already selected. Again we can construct a counterexample to show that this greedy algorithm does not always produce an optimal schedule. So, suppose that we have three talks: Talk 1 starts at 8 A.M. and ends at 9:15 A.M., Talk 2 starts at9 A.M. and ends at 10 A.M., and Talk 3 starts at 9:45 A.M. and ends at 11 A.M. We select Talk 2 because it is shortest, requiring one hour. Once we select Talk 2, we cannot select either Talk 1 or Talk 3 because neither is compatible with Talk 2. Hence, this greedy algorithm selects only one talk. However, it is possible to select two talks, Talk 1 and Talk 3, which are compatible. However, it can be shown that we schedule the most talks possible if in each step we select the talk with the earliest ending time among the talks compatible with those already selected. We will prove this in Chapter 5 using the method of mathematical induction. The first step we will make is to sort the talks according to increasing finish time. After this sort- ing, we relabel the talks so that e1 ≤ e2 ≤ ⋯ ≤ en. The resulting greedy algorithm is given as Algorithm 8. ◂

ALGORITHM 8 Greedy Algorithm for Scheduling Talks. procedure schedule(s1 ≤ s2 ≤ ⋯ ≤ sn: start times of talks,e1 ≤ e2 ≤ ⋯ ≤ en: ending times of sorttalks) talks by finish time and reorder so that e1 ≤ e2 ≤ ⋯ S := ∅ ≤ en for j := 1 to n if talk j is compatible with S then S := S ∪ {talk return S{S isj} the set of talks scheduled}

The Halting Problem

We will now describe a proof of one of the most famous theorems in computer science. We will show that there is a problem that cannot be solved using any procedure. That is, we will Links show there are unsolvable problems. The problem we will study is the halting problem. It asks whether there is a procedure that does this: It takes as input a computer program and input to the program and determines whether the program will eventually stop when run with this input. It would be convenient to have such a procedure, if it existed. Certainly being able to test whether a program entered into an infinite loop would be helpful when writing and debugging programs. However, in 1936 Alan Turing showed that no such procedure exists (see his biography in Section 13.4). Before we present a proof that the halting problem is unsolvable, first note that we cannot simply run a program and observe what it does to determine whether it terminates when run

3.1 Algorithms 250

If H(P, P) = “halts,” P as then loop forever program Input Output Program Program P H(P, P) Program H(P, I) H P P K(P) If ( , ) = “loops forever,” then halt P as input

“loops forever” as FIGURE 3 Showing that output. We will now the halting problem is derive a contradiction. unsolvable. When a procedure is coded, it is expressed as a string of with the given input. If characters; this string the program halts, we can be interpreted as a have our answer, but if it sequence of bits. This is still running after any means that a program fixed length of time has itself can be used as data. Therefore, a elapsed, we do not know program can be whether it will never halt thought of as input to or we just did not wait another program, or long enough for it to even itself. Hence, H terminate. After all, it is can take a program P not hard to design a as both of its inputs, program that will stop which are a program only after more than a and input to this billion years has elapsed. program. H should be We will describe able to determine Turing’s proof that the whether P will halt halting problem is when it is given a copy unsolvable; it is a proof of itself as input. by contradiction. (The To show that no reader should note that procedure H exists that solves the halting our proof is not problem, we construct completely rigorous, a simple procedure because we have not K(P), which works as explicitly defined what a follows, making use of procedure is. To remedy the output H(P, P). If this, the concept of a the output of H(P, P) is Turing machine is “loops forever,” which needed. This concept is means that P loops introduced in Section forever when given a 13.5.) copy of itself as input, then K(P) halts. If the Proof: Assume there is a output of H(P, P) is solution to the halting problem, “halt,” which means a procedure called H(P, I). The that P halts when given a copy of itself as pro- cedure ( ) takes H P, I input, then K(P) loops two inputs, one a program P forever. That is, K(P) and the other I, an input to the does the opposite of program P. what the output of H(P, H(P,I) generates the string “halt” as output if H P) specifies. (See determines that P stops Figure 3.) when given I as in- Now suppose we put. Otherwise, H(P, I) provide K as intput to generates the string K. We note that if the

3.1 Algorithms 251

output of H(K, K) is which means that by “loops forever,” then by the definition of H, the the definition of K, we see output of H(K, K) is that K(K) halts. This “loops forever.” This is means that by the also a contradiction. definition of H, the Thus, H cannot always output of H(K, K) is give the correct “halt,” which is a answers. Consequently, contradiction. Otherwise, there is no procedure if the output of ( ) is that solves the halting H K, K problem. “halts,” then by the definition of K, we see 70. . that K(K) loops forever, 3.2 The Growth of Functions Introduction

In Section 3.1 we discussed the concept of an algorithm. We introduced algorithms that solve a variety of problems, including searching for an element in a list and sorting a list. In Section 3.3 we will study the number of operations used by these algorithms. In particular, we will estimate the number of comparisons used by the linear and binary search algorithms to find an element in a sequence of n elements. We will also estimate the number of comparisons used by the bubble sort and by the insertion sort to sort a list of n elements. The time required to solve a problem depends on more than only the number of operations it uses. The time also depends on the hardware and software used to run the program that implements the algorithm. However, when we change the hardware and software used to implement an algorithm, we can closely

3.2 The Growth of Functions 217 approximate the time required to solve a problem of size n by multiplying the previous time required by a constant. For example, on a supercomputer we might be able to solve a problem of size n a million times faster than we can on a PC. However, this factor of one million will not depend on n (except perhaps in some minor ways). One of the advantages of using big-O notation, which we introduce in this section, is that we can estimate the growth of a function without worrying about constant multipliers or smaller order terms. This means that, using big- O notation, we do not have to worry about the hardware and software used to implement an algorithm. Furthermore, using big-O notation, we can assume that the different operations used in an algorithm take the same time, which simplifies the analysis considerably. Big-O notation is used extensively to estimate the number of operations an algorithm uses as its input grows. With the help of this notation, we can determine whether it is practical to use a particular algorithm to solve a problem as the size of the input increases. Furthermore, using big-O notation, we can compare two algorithms to determine which is more efficient as the size of the input grows. For instance, if we have two algorithms for solving a problem, one using 100n2 + 17n + 4 operations and the other using n3 operations, big-O notation can help us see that the first algorithm uses far fewer operations when n is large, even though it uses more operations for small values of n, such as n = 10. This section introduces big-O notation and the related big-Omega and big-Theta notations. We will explain how big-O, big-Omega, and big-Theta estimates are constructed and establish estimates for some important functions that are used in the analysis of algorithms. Big-O Notation The growth of functions is often described using a special notation. Definition 1 describes this notation.

Definition 1 Let f and g be functions from the set of integers or the set of real numbers to the setnumbers. of real We say that f (x) is O(g(x)) if there are constants C and k

suchf ( that x)| ≤ C|g (x )| | whenever x > k. [This is read as x) is big-oh of g(x).”] “f ( Remark: Intuitively, the definition that f (x) is O(g(x)) says that f (x) grows slower than some fixed multiple of g(x) as x grows without bound.

The constants C and k in the definition of big-O notation are called witnesses to the rela- tionship f (x) is O(g(x)). To establish that f (x) is O(g(x)) we need only one pair of witnesses to this relationship. That is, to show that f (x) is O(g(x)), we need find only one pair of constants C Assessment and k, the witnesses, such that |f (x) |C ≤ g (x|) whenever| x k. > Note that when there is one pair of witnesses to the relationship f (x) is O(g(x)), there are infinitely many pairs of witnesses. To see this, note that if C and k are one pair of witnesses, then any pair ′ and ′, where < ′ and < ′, is also a pair of witnesses, because (| ) | C k C C k k f x ≤ C|g(x)| ≤ ′ ′C |g(x)|

Links THEwhenever x > HISTORY k > k. OF BIG-O NOTATION Big-O notation has been used in mathematics for more than a century. In computer science it is widely used in the analysis of algorithms, as will be seen in Section 3.3. The German mathematician Paul Bachmann first introduced big-O notation in 1892 in an important book on number theory. The big-O symbol is sometimes called a Landau symbol after the German mathematician Edmund Landau, who used this notation throughout his work. The use of big-O notation in computer science was popularized by Donald Knuth, who also introduced the big-Ω and big-Θ notations defined later in this section.

3.2 The Growth of Functions 218 WORK ING WIT H TH E DEFINI TION OF B IG-O NOTATION A useful approach for estimateding a pair whenof wit nx e>ss ke sand is toto seefirs whethert select awev alcanue useokf thisfor which estimate the to size find of a value|f (x) | canof C be for r whicheadily findf (x-) ≤ C g(x) for x > k. This approach is illustrated in Example 1. | | | | EXAMPLE 1 Show that f (x) = x2 + 2x + 1 is O(x2).

Solution: We observe that we can readily estimate the size of f (x) when x > 1 because x < Extra x2 2

Examples and 1 < x when x > 1. It follows that

2 2 2 2 2 0 ≤ x + 2x + 1 ≤ x + 2x + x = 4x

whenever x > 1, as shown in Figure 1. Consequently, we can take C = 4 and k = 1 as witnesses to show that f (x) is O(x2 ). That is, f (x) = x2 + 2x + 1 < 4x2 whenever x > 1. (Note that it is not necessary to use absolute values here because all functions in these equalities are positive when x is positive.) Alternatively, we can estimate the size of f (x) when x > 2. When x > 2, we have 2x ≤ x2 and 1 ≤ x2 . Consequently, if x > 2, we have

0 ≤ x2 + 2x + 1 ≤ x2 + x2 + x2 = 3x2.

It follows that C = 3 and k = 2 are also witnesses to the relation f (x) is O(x2). Observe that in the relationship “f (x) is O(x2 ),” x2 can be replaced by any function that h as 2 3 larger values than x for all x ≥ k for some positive real number k. For example, f (x) is O(x ), f (x) is O(x2 + x + 7), and so on. 2 2 2 2 It is also true that x is O(x + 2x + 1), because x < x + 2x + 1 whenever x > 1. This means that C = 1 and k = 1 are witnesses to the relationship x2 is O(x2 + 2x + 1). ◂

Note that in Example 1 we have two functions, f (x) = x2 + 2x + 1 and g(x) = x2, such that f (x) is O(g(x)) and g(x) is O(f (x))—the latter fact following from the inequality x2 ≤ x2 + 2x + 1, which holds for all nonnegative real numbers x. We say that two functions

4 x 2 x 2 +2 x +1 x 2

4

3 The part of the graph of f (x) = x 2 +2 x +1

that satisfies f (x) < 4 x 2 is shown in color.

2 x 2 +2 x + 1 < 4 x 2 for x > 1

1

3.2 The Growth of Functions 219

1 2

FIGURE 1 The function x2 + 2x + 1 is O(x2).

3.2 The Growth of Functions 220 f (x) and g(x) that satisfy both of these big-O relationships are of the same order. We will return to this notion later in this section.

Remark: The fact that f (x) is O(g(x)) is sometimes written f (x) = O(g(x)). However, the equals sign in this notation does not represent a genuine equality. Rather, this notation tells us that an inequality holds relating the values of the functions f and g for sufficiently large numbers in the domains of these functions. However, it is acceptable to write f (x) ∈ O(g(x)) because O(g(x)) represents the set of functions that are O(g(x)).

When f (x) is O(g(x)), and h(x) is a function that has larger absolute values than g(x) does for sufficiently large values of x, it follows that f (x) is O(h(x)). In other words, the function g(x) in the relationship f (x) is O(g(x)) can be replaced by a function with larger absolute values. To see this, note that if

| f (x)| ≤ C|g(x)| if x > k,

and if |h(x)| > |g(x)| for all x > k, then

| f (x)| ≤ C|h(x)| if x > k. Hence, f (x) is O(h(x)).

When big-O notation is used, the function g in the relationship f (x) is O(g(x)) is often chosen to have the smallest growth rate of the functions belonging to a set of reference functions, such as functions of the form xn, where n is a positive real number. (Important reference functions are discussed later in this section.) In subsequent discussions, we will almost always deal with functions that take on only positive values. All references to absolute values can be dropped when working with big-O estimates for such functions. Figure 2 illustrates the relationship f (x) is O(g(x)).

Links

.

3.2 The Growth of Functions 221 Cg (x)

f (x) The part of the graph of f (x) that satisfies

g(x) f (x) < Cg (x) is shown in color.

f (x) < Cg (x) for x > k

k

FIGURE 2 The function f (x) is O(g(x)).

Example 2 illustrates how big-O notation is used to estimate the growth of functions.

EXAMPLE 2 Show that 7x2 is O(x3).

Solution: Note that when x 7, we have 7x2 x3. (We can obtain this inequality by multiplying both sides of x > 7 by x .) Conse> quently, we 1, we have 7x < 7x , so that C = 7 and 2 3 k = 1 are also witnesses to the relationship 7x is O(x ). ◂

3.2 The Growth of Functions 222

3.2 The Growth of Functions 223 EXAMPLE 3 Show that n2 is not O(n).

Solution: To show that n2 is not O(n), we must show that no pair of witnesses C and k exist such that n Cn whenever n > k. We will use a proof by contradiction to show this. 2 Suppose≤ that there are constants C and k for which n Cn whenever n > k. Observe that 2 when n > 0 we can divide both sides of the inequality n≤ Cn by n to obtain the equivalent 2 inequality n ≤C . However, no matter what C and k are, the inequality≤ n C cannot≤ hold for all n with n > k. In particular, once we set a value of k, we see that when n is larger than the maximum of k and C, it is not true that n≤ C even though n > k. This contradiction shows that n2 is not O(n). ◂

EXAMPLE 4 Example 2 shows that 7x2 is O(x3). Is it also true that x3 is O(7x2)?

Solution: To determine whether x3 is O(7x2), we need to determine whether k exist, so that x 3≤ C(7x ) whenever2 x > k. We will show that no such witnesses exist using a proofIf by andcontradic are tion.witnesses, the inequality 3 witnesseC s Ck and x ≤ C(7x2) holds for all x > k. Observe that the inequality x3 ≤ C(7x2) is equivalent to the inequality x ≤ 7C, which follows by divid- ing both sides by the positive quantity x2. However, no matter what C is, it is not the case that x 7C for all x > k no matter what k is, because x can be made arbitrarily large. It fol- lows that≤ no witnesses C and k exist for this proposed big-O relationship. Hence, x3 is not O(7x2 ). ◂

Big-O Estimates for Some Important Functions

Polynomials can often be used to estimate the growth of functions. Instead of analyzing the growth of polynomials each time they occur, we would like a result that can always be used to estimate the growth of a polynomial. Theorem 1 does this. It shows that the leading term of a polynomial dominates its growth by asserting that a polynomial of degree n or less is O(xn).

THEOREM 1 n n−1 Let f (x) = anx + an−1x + ⋯ + a1x + a0, where a0, a1, … , an−1, an are real numbers. Thenf (x) is O(xn). Proof: Using the triangle inequality (see Exercise 9 in Section 1.8), if x > 1 we have

n n−1 | f (x)| = |anx + an−1x + ⋯ + a1x + a0|

n n−1 ≤ |an|x + |an−1|x + ⋯ + |a1|x + |a0|

−1 = xn ( a + a ∕x + ⋯ + a ∕xn + a ∕xn ) | n| | n−1| | 1| | 0| ( ) n ≤ x |an| + |an−1| + ⋯ + |a1| + |a0| .

3.2 The Growth of Functions 224 This shows that

| f (x)| ≤ Cxn,

where C = |an| + |an−1| + ⋯ + |a0| whenever x > 1. Hence, the witnesses C = |an| + |an−1|

+ ⋯ + |a 0 | and k = 1 show that f (x) is O(xn). We now give some examples involving functions that have the set of positive integers as their domains.

EXAMPLE 5 How can big-O notation be used to estimate the sum of the first n positive integers?

Solution: Because each of the integers in the sum of the first n positive integers does not exceed n, it follows that

1 + 2 + ⋯ + n ≤ n + n + ⋯ + n = n2.

From this inequality it follows that 1 + 2 + 3 + ⋯ + n is O(n2), taking C = 1 and k = 1 as wit- nesses. (In this example the domains of the functions in the big-O relationship are the set of positive integers.) ◂

In Example 6 big-O estimates will be developed for the factorial function and its logarithm. These estimates will be important in the analysis of the number of steps used in sorting proce- dures.

EXAMPLE 6 Give big-O estimates for the factorial function and the logarithm of the factorial function, where the factorial function f (n) = n! is defined by

n! = 1 ⋅ 2 ⋅ 3 ⋅ ⋯ ⋅ n

whenever n is a positive integer, and 0! = 1. For example,

1! = 1, 2! = 1 ⋅ 2 = 2, 3! = 1 ⋅ 2 ⋅ 3 = 6, 4! = 1 ⋅ 2 ⋅ 3 ⋅ 4 = 24.

Note that the function n! grows rapidly. For instance,

20! = 2,432,902,008,176,640,000.

Solution: A big-O estimate for n! can be obtained by noting that each term in the product does not exceed n. Hence,

n! = 1 ⋅ 2 ⋅ 3 ⋅ ⋯ ⋅ ≤ n n ⋅ n ⋅ n ⋅ ⋯ ⋅ n

= nn.

This inequality shows that n! is O(nn), taking C = 1 and k = 1 as witnesses. Taking logarithms of both sides of the inequality established for n!, we obtain

log n! ≤ log nn = n log n.

3.2 The Growth of Functions 225 This implies that log n! is O(n log n), again taking C = 1 and k = 1 as witnesses. ◂

3.2 The Growth of Functions 226 EXAMPLE 7 In Section 5.1 , we will show that n < 2n whenever n is a positive integer. Show that this in- equality implies that n is O(2n), and use this inequality to show that log n is O(n).

Solution: Using the inequality n < 2n, we quickly can conclude that n is O(2n) by taking k = C = 1 as witnesses. Note that because the logarithm function is increasing, taking logarithms (base 2) of both sides of this inequality shows that

log n < n.

It follows that

log n is O(n).

(Again we take C = k = 1 as witnesses.) If we have logarithms to a base b, where b is different from 2, we still have logb n is O(n) because log n n logb n = <

log b log b whenever n is a positive integer. We take C = 1∕ log b and k = 1 as witnesses. (We have used Theorem 3 in Appendix 2 to see that logb n = log n ∕ log b.) ◂

As mentioned before, big-O notation is used to estimate the number of operations needed to solve a problem using a specified procedure or algorithm. The functions used in these estimates often include the following:

1, log n, n, n log n, n2, 2n, n!

Using calculus it can be shown that each function in the list is smaller than the succeeding function, in the sense that the ratio of a function and the succeeding function tends to zero as n grows without bound. Figure 3 displays the graphs of these functions, using a scale for the values of the functions that doubles for each successive marking on the graph. That is, the vertical scale in this graph is logarithmic.

n! 4096 2

2048 1

1024 n 2 2 3 4 512

256 n2

128

64 n log n 32 n 16

8 log n 4

l

3.2 The Growth of Functions 227

5 6 7 8

FIGURE 3 A display of the growth of functions commonly used in big-O estimates.

3.2 The Growth of Functions 228 USEFUL BIG-O ESTIMATES INVOLVING LOGARITHMS, POWERS, AND EXPONEN- TIAL FUNCTIONS We now give some useful facts that help us determine whether big- O relationships hold between pairs of functions when each of the functions is a power of a loga- rithm, a power, or an exponential function of the form bn where b > 1. Their proofs are left as Exercises 57–62 for readers skilled with calculus. Theorem 1 shows that if f (n) is a polynomial of degree d or less, then f (n) is O(nd ). Applying this theorem, we see that if d > c > 1, then nc is O(nd ). We leave it to the reader to show that the reverse of this relationship does not hold. Putting these facts together, we see that if d > c > 1, then

nc is O(nd ), but nd is not O(nc).

In Example 7 we showed that logb n is O(n) whenever b > 1. More generally, whenever b > 1 and c and d are positive, we have

c d d c (logb n) is O(n ), but n is not (O(logb n) ).

This tells us that every positive power of the logarithm of n to the base b, where b > 1, is big-O of every positive power of n, but the reverse relationship never holds. In Example 7, we also showed that n is O(2n). More generally, whenever d is positive and b > 1, we have

nd is O(bn), but bn is not O(nd ).

This tells us that every power of n is big-O of every exponential function of n with a base that is greater than one, but the reverse relationship never holds. Furthermore, when c > b > 1 we have

bn is O(cn), but cn is not O(bn).

This tells us that if we have two exponential functions with different bases greater than one, one of these functions is big-O of the other if and only if its base is smaller or equal. Finally, we note that if c > 1, we have

cn is O(n!), but n! is not O(cn).

We can use the big-O estimates discussed here to help us order the growth of different functions, as Example 8 illustrates.

EXAMPLE 8 Arrange the functions f (n) =√ 8 n, f (n) = (log n)2, f (n) = 2n log n, f (n) = n!, f (n) = (1 n 2 1 2 3 4 5 . 1) , and f6(n) = n in a list so that each function is big-O of the next function.

2 Solution: From the big-O estimates described in this subsection, we see that f2(n) = (log n) is the slowest growing of these functions. (This follows because log n grows slower than any positive power of n.) The next three functions, in order, are f (n) = √8 n = f (n) = 2n log n, and 2 1∕2 1 3 f6(n) = n . (We know this because f1(n) = 8n , f3(n) = 2n log n is a function that grows faster c 2 c than n but slower than n for every c > 1, and f6(n) = n is of the form n where c = 2.) The next n function in the list is f5(n) = (1.1) , because it is an exponential function with base 1.1. Finally, f4(n) = n! is the fastest growing function on the list, because f (n) = n! grows faster than any exponential function of n. ◂

3.2 The Growth of Functions 229 The Growth of Combinations of Functions

Many algorithms are made up of two or more separate subprocedures. The number of steps used by a computer to solve a problem with input of a specified size using such an algorithm is the sum of the number of steps used by these subprocedures. To give a big-O estimate for the number of steps needed, it is necessary to find big-O estimates for the number of steps used by each subprocedure and then combine these estimates. Big-O estimates of combinations of functions can be provided if care is taken when different big- O estimates are combined. In particular, it is often necessary to estimate the growth of the sum and the product of two functions. What can be said if big-O estimates for each of two functions are known? To see what sort of estimates hold for the sum and the product of two functions, suppose that f1(x) is O(g1(x)) and f2(x) is O(g2(x)). From the definition of big-O notation, there are constants C1, C2, k1, and k2 such that

| f1(x)| ≤ C1 g|1 (x) |

when x > k1, and

f2(x) ≤ C2 g2(x) | | | |

when x > k2. To estimate the sum of f1(x) and f2(x), note that

( f1 + f2)(x) = f1(x) + | | | | | | | | f2(x) ≤ a + b . ≤f1 (x)| + | f2(x | using the triangle inequality |a + b| ) | | | | | When x is greater tha n both k1 and k2, it follows from the inequalities for f1(x) and f2(x) that

| f1(x)| + | f2(x)| ≤ C1|g1(x)| +

C2|g2(x)| ≤ C1|g(x)| + C2|g(x)|

= (C + C )|g(x)|1

= C|g(x)|,2

where C = C + C and g(x) = max( g (x) , g (x) ). [Here max(a, b) denotes the maximum, or larger, of |a and| b|.] |1 2 1

2 This inequality shows that| ( f + f )(|x )≤ C |g (x) | whenever x1> k, where k = max(k2 , k ). 1 2 We state this useful result as Theorem 2.

THEOREM 2 Suppose that f1(x) is O(g1(x)) and that f2(x) is O(g2(x)). Then ( f1 + f2)(x) is O(g(x)), whereg(x) = (max( |g1(x)|, |g2(x)|) for all . x

We often have big-O estimates for f1 and f2 in terms of the same function g. In this situation, Theorem 2 can be used to show that ( f1 + f2)(x) is also O(g(x)), because max(g(x), g(x)) = g(x). This result is stated in Corollary 1.

COROLLARY 1 Suppose that f1(x) and f2(x) are both O(g(x)). Then ( f1 + f2)(x) is O(g(x)).

3.2 The Growth of Functions 230

In a similar way big-O estimates can be derived for the product of the functions f1 and f2. When x is greater than max(k1, k2) it follows that

|( f1f2)(x)| = | f1(x)|| f2(x)|

≤ C1|g1( x)|C2|g2( x ) | C≤ C |(g g )(x)|1

2 1 2 | C (g g )(x)|,1 ≤ where C = C1C2. 2F rom this inequality, it follows that f1(x)f2(x) is O(g1g2(x)), because there are

constants C and k, namely, C =| C C and| k = max| (k , k ),1| such 2 that1 (2 f f )( x) ≤ C g1 ( 2x )g whenever x > k. This result is stated in Theorem 3. (x) 1 2

THEOREM 3 Suppose that f1(x) is O(g1(x)) and f2(x) is O(g2(x)). Then ( f1f2)(x) is O(g1(x)g2(x)).

The goal in using big-O notation to estimate functions is to choose a function g(x) as simple as possible, that grows relatively slowly so that f (x) is O(g(x)). Examples 9 and 10 illustrate how to use Theorems 2 and 3 to do this. The type of analysis given in these examples is often used in the analysis of the time used to solve problems using computer programs. EXAMPLE 9 Give a big-O estimate for f (n) = 3n log(n!) + (n2 + 3) log n, where n is a positive integer.

Solution: First, the product 3n log(n!) will be estimated. From Example 6 we know that log(n!) is O(n log n). Usi ng this estimate and the fact that 3n is O(n), Theorem 3 gives the estimate that 2 3n log(n!) is O(n log n). 2 2 2 Next, the product (n + 3) log n will be estimated. Because (n + 3) < 2n when n > 2, it follows that n2 + 3 is O(n2). Thus, from Theorem 3 it follows that (n2 + 3) log n is O(n2 log n). Using Theorem 2 to combine the two big-O estimates for the products shows that f (n) = 2 2 3n log(n!) + (n + 3) log n is O(n log n). ◂

EXAMPLE 10 Give a big-O estimate for f (x) = (x + 1) log(x2 + 1) + 3x2.

Solution:Furthermore, First, x +a 1big ≤ -2Ox whenes timate x > for1. Hence,(x + 1) log(x2 + 1) will be found. Note that (x + 1) is 2 2 O(x).log (x2 + 1) ≤ log(2x2) = log 2 + log x2 = log 2 + 2 log x ≤ 3 log x,

if x > 2. This shows that log(x2 + 1) is O(log x). From Theorem 3 it follows that (x + 1) log(x + 21 ) is O(x log x). Because 3x 2i s O(x 2), The- orem 2 tells us that f (x) is O(max(x log x, 2x )). Because x log x ≤ 2x , for x > 1, it follows that f (x) is O(x2). ◂

Big-Omega and Big-Theta Notation Big-O notation is used extensively to describe the growth of functions, but it has limitations. In particular, when f (x) is O(g(x)), we have an upper bound, in terms of g(x), for the size of f (x) for large values of x. However, big-O notation does not provide a lower bound for the size of f (x) for large x. For this, we use big-Omega (big-Ω) notation. When we want to give both Ω and Θ are the an upper and a lower bound on the size of a function f (x), relative to a reference function g(x), Greek uppercase we use big-Theta (big-Θ) notation. Both big-Omega and big-Theta notation were introduced letters omega and theta, respectively.

3.2 The Growth of Functions 231 by Donald Knuth in the 1970s. His motivation for introducing these notations was the common misuse of big-O notation when both an upper and a lower bound on the size of a function are needed. We now define big-Omega notation and illustrate its use. After doing so, we will do the same for big-Theta notation.

Let f and g be functions from the set of integers or the set of real numbers to the Definition 2 setnumbers. of real We say that f (x) is Ω(g(x)) if there are constants C and k with C positive such( that f )| ≥ | ( ) x C g x | | whenever x > k. [This is read as x) is big-Omega of “f ( g(x).”]

There is a strong connection between big-O and big-Omega notation. In particular, f (x) is Ω(g(x)) if and only if g(x) is O(f (x)). We leave the verification of this fact as a straightforward exercise for the reader. EXAMPLE 11 The function f (x) = 8x3 + 5x2 + 7 is Ω(g(x)), where g(x) is the function g(x) = x3. This is easy to see because f (x) = 8x3 + 5x2 + 7 ≥ 8 x 3for all positive real numbers x. This is equivalent to saying that g(x) = x3 is O(8x3 + 5x2 + 7), which can be established directly by turning the inequality around. ◂

Often, it is important to know the order of growth of a function in terms of some relatively simple reference function such as xn when n is a positive integer or cx, where c > 1. Knowing the order of growth requires that we have both an upper bound and a lower bound for the size of the function. That is, given a function f (x), we want a reference function g(x) such that f (x) is O(g(x)) and f (x) is Ω(g(x)). Big-Theta notation, defined as follows, is used to express both of these relationships, providing both an upper and a lower bound on the size of a function.

Let f and g be functions from the set of integers or the set of real numbers to the Definition 3 setreal of numbers. We say that f (x) is Θ(g(x)) if f (x) is O(g(x)) and f (x) is Ω(g(x)). When f (Θx()g is(x )), we say that f is big-Theta of g(x), that f (x) is of order g(x), and that f (x) and

gare(x) of the same

. order

When f (x) is Θ(g(x)), it is also the case that g(x) is Θ( f (x)). Also note that f (x) is Θ(g(x)) if and only if f (x) is O(g(x)) and g(x) is O( f (x)) (see Exercise 31). Furthermore, note that f (x) is Θ(g(x)) if and only if there are positive real numbers C1 and C2 and a positive real number k such that

C1| g (x)| ≤ | f (x)| ≤ C2| g (x)|

whenever x > k. The existence of the constants C1, C2, and k tells us that f (x) is Ω(g(x)) and that f (x) is O(g(x)), respectively. Usually, when big-Theta notation is used, the function g(x) in Θ(g(x)) is a relatively simple reference function, such as xn, cx, log x, and so on, while f (x) can be relatively complicated. EXAMPLE 12 We showed (in Example 5) that the sum of the first n positive integers is O(n2). Determine whether this sum is of order n2 Extra without using the summation formula for this sum.

3.2 The Growth of Functions 232

Examples Solution: Let f (n) = 1 + 2 + 3 + + n. Because we already know that f (n) is O(n2), to show ⋯ that f (n) is of order n 2we need to find a positive constant C such that f (n) > Cn for2 sufficiently

3.2 The Growth of Functions 233 large integers n. To obtain a lower bound for this sum, we can ignore the first half of the terms. Summing only the terms greater than ⌈ n ∕2 ⌉ , we find that

1 + 2 + ⋯ + n ≥ ⌈n∕2⌉ + ( ⌈n∕2⌉ + 1) + ⋯ + n ≥ ⌈n∕2⌉ + ⌈n∕2⌉ + ⋯ + ⌈n∕2⌉ = (n − ⌈n∕2⌉ + 1) ⌈n∕2⌉ (n∕2)(n∕2) ≥ = n2∕4.

This shows that f (n) is Ω(n2). We conclude that f (n) is of order n2, or in symbols, f (n) is Θ(n ). ◂ 2 Remark: Note that we can also show that ( ) ∑ f n i is Θ(n2) using the closed formula

n∑ n i=1 = i=1 = n(n + 1)∕2 from Table 2 in Section 2.4 and derived in Exercise 37(b) of that section.

EXAMPLE 13 Show that 3x2 + 8x log x is Θ(x2). Solution: Because 0 ≤ 8x log x ≤ 8x2, it follows that 3x2 + 8x log x ≤ 11x2 for x > 1. Extra Examples Co nsequently, 3x2 + 8x log x is O(x2). Clearly, x2 is O(3x2 + 8x log x). Consequently, 2 3x + 8x log x is Θ(x2). ◂

One useful fact is that the leading term of a polynomial determines its order. For example, if f ( x) = 5 + x4 + 17x3 + 2, then f (x) is of order x5 . This is stated in Theorem 4, whose proof is left as Exercise 50. 3x

n n− 1 Let f ( x) = an x + an − 1 x + ⋯ + a 1x + a 0, where a 0, a 1, … , a nare real numbers THEOREM 4 n an ≠ 0.with Then f x) is of order x ( .

EXAMPLE 14 The polynomials 3x8 + 10x7 + 221x2 + 1444, x19 − 18x4 − 10,112, and −x99 + 40,001x98 +

100,003x are of orders x8, x19, and x99, respectively. ◂

Unfortunately, as Knuth observed, big-O notation is often used by careless writers and speakers as if it had the same meaning as big-Theta notation. Keep this in mind when you see big-O notation used. The recent trend has been to use big-Theta notation whenever both upper and lower bounds on the size of a function are needed.

Complexity3.3 of Algorithms

Introduction

When does an algorithm provide a satisfactory solution to a problem? First, it must always pro- duce the correct answer. How this can be demonstrated will be discussed in Chapter 5. Second, it should be efficient. The efficiency of algorithms will be discussed in this section. How can the efficiency of an algorithm be analyzed? One measure of efficiency is the time used by a computer to solve a problem using the algorithm, when input values are of a specified size. A second measure is the amount of computer memory required to implement the algorithm when input values are of a specified size. Questions such as these involve the computational complexity of the algorithm. An anal- ysis of the time required to solve a problem of a particular size involves the time complexity of the algorithm. An analysis of the computer memory required involves the space complexity of the algorithm. Considerations of the time and space complexity of an algorithm are essential when algorithms are implemented. It is important to know whether an algorithm will produce an answer in a microsecond, a minute, or a billion years. Likewise, the required memory must be available to solve a problem, so that space complexity must be taken into account. Considerations of space complexity are tied in with the particular data structures used to implement the algorithm. Because data structures are not dealt with in detail in this book, space complexity will not be considered. We will restrict our attention to time complexity.

Time Complexity

The time complexity of an algorithm can be expressed in terms of the number of operations used by the algorithm when the input has a particular size. The operations used to measure time complexity can be the comparison of integers, the addition of integers, the multiplication of integers, the division of integers, or any other basic operation. Time complexity is described in terms of the number of operations required instead of ac- tual computer time because of the difference in time needed for different computers to perform basic operations. Moreover, it is quite complicated to break all operations down to the basic bit operations that a computer uses. Furthermore, the fastest computers in existence can perform basic bit operations (for instance, adding, multiplying, comparing, or exchanging two bits) in 10−11 second (10 picoseconds), but personal computers may require 10−8 second (10 nanosec- onds), which is 1000 times as long, to do the same operations.

We illustrate how to analyze the time complexity of an algorithm by considering Algo- rithm 1 of Section 3.1, which finds the maximum of a finite set of integers.

EXAMPLE 1 Describe the time complexity of Algorithm 1 of Section 3.1 for finding the maximum element in a finite set of integers. Solution: The number of comparisons will be used as the measure of the time complexity of the Extra

Examples algorithm, because comparisons are the basic operations used. To find the maximum element of a set with n elements, listed in an arbitrary order, the temporary maximum is first set equal to the initial term in the list. Then, after a comparison i ≤n has been done to determine that the end of the list has not yet been reached, the temporary maximum and second term are compared, updating the temporary maximum to the value of the second term if it is larger. This procedure is continued, using two additional comparisons for each term of the list—one i ≤ n, to determine that the end of the list has not been reached and another max < ai, to determine whether to update the temporary maximum. Because two comparisons are used for each of the second through the nth elements and one more comparison is used to exit the loop when i = n + 1, exactly 2(n − 1) + 1 = 2n − 1 comparisons are used whenever this algorithm is applied. Hence, the algorithm for finding the maximum of a set of n elements has time complexity Θ(n), measured in terms of the number of comparisons used. Note that for this algorithm the number of comparisons is independent of particular input of n numbers. ◂

Next, we will analyze the time complexity of searching algorithms.

EXAMPLE 2 Describe the time complexity of the linear search algorithm (specified as Algortihm 2 in Section 3.1).

Solution: The number of comparisons used by Algorithm 2 in Section 3.1 will be taken as the measure of the time complexity. At each step of the loop in the algorithm, two comparisons are performed—one i≤ n , to see whether the end of the list has been reached and one x≤ ai, to compare the element x with a term of the list. Finally, one more comparison ≤i n is made outside the loop. Consequently, if x = ai, 2i + 1 comparisons are used. The most comparisons, 2n + 2, are required when the element is not in the list. In this case, 2n comparisons are used to determine that x is not ai, for i = 1, 2, … , n, an additional comparison is used to exit the loop, and one comparison is made outside the loop. So when x is not in the list, a total of 2n + 2 comparisons are used. Hence, a linear search requires Θ(n) comparisons in the worst case, because 2n + 2 is Θ(n). ◂

WORST-CASE COMPLEXITY The type of complexity analysis done in Example 2 is a worst-case analysis. By the worst-case performance of an algorithm, we mean the largest num- ber of operations needed to solve the given problem using this algorithm on input of specified size. Worst-case analysis tells us how many operations an algorithm requires to guarantee that it will produce a solution.

EXAMPLE 3 Describe the time complexity of the binary search algorithm (specified as Algorithm 3 in Section 3.1) in terms of the number of comparisons used (and ignoring the time required to compute m = ⌊ ( i + j)∕2 ⌋ in each iteration of the loop in the algorithm).

k Solution: For simplicity, assume there are n = 2 elements in the list a1, a2, … , an, where k is a nonnegative integer. Note that k = log n. (If n, the number of elements in the list, is not a power of 2, the list can be considered part of a larger list with 2k+1 elements, where 2k < n < 2k+1. Here 2k+1 is the smallest power of 2 larger than n.)

At each stage of the algorithm, i and j, the locations of the first term and the last term of the restricted list at that stage, are compared to see whether the restricted list has more than one term. If i < j, a comparison is done to determine whether x is greater than the middle term of the restricted list. At the first stage the search is restricted to a list with 2k−1 terms. So far, two comparisons have been used. This procedure is continued, using two comparisons at each stage to restrict the search to a list with half as many terms. In other words, two comparisons are used at the first stage of the algorithm when the list has 2k elements, two more when the search has been reduced to a list with 2k−1 elements, two more when the search has been reduced to a list with 2k−2 elements, and so on, until two comparisons are used when the search has been reduced to a list with 21 = 2 elements. Finally, when one term is left in the list, one comparison tells us that there are no additional terms left, and one more comparison is used to determine if this term is x. Hence, at most 2k + 2 = 2 log n + 2 comparisons are required to perform a binary search when the list being searched has 2k elements. (If n is not a power of 2, the original list is expanded to a list with 2k+1 terms, where k =⌊ log n⌋ , and the search requires at most 2 ⌈ log n⌉ + 2 comparisons.) It follows that in the worst case, binary search requires O(log n) comparisons. Note that in the worst case, 2 log n + 2 comparisons are used by the binary search. Hence, the binary search uses Θ(log n) comparisons in the worst case, because 2 log n + 2 = Θ(log n). From this analysis it follows that in the worst case, the binary search algorithm is more efficient than the linear search algorithm, because we know by Example 2 that the linear search algorithm has Θ(n) worst-case time complexity. ◂

AVERAG E-C AS E CO Another important type of complexity analysis, besides worst-case analysis, is average-case analysis. The average number of operations used Mto PsolveL EXI TtheY problem over all possible inputs of a given size is found in this type of analy- scis.al leAveraged - case time complexity analysis is usually much more complicated than worst-case analysis. However, the average-case analysis for the linear search algorithm can be done without difficulty, as shown in Example 4. EXAMPLE 4 Describe the average-case performance of the linear search algorithm in terms of the average number of comparisons used, assuming that the integer x is in the list and it is equally likely that x is in any position.

Solution: By hypothesis, the integer x is one of the integers a1, a2, … , an in the list. If x is the first term a1 of the list, three comparisons are needed, one i≤ n to determine whether the end of the list has been reached, one x≠ ai to compare x and the first term, and one ≤i n outside the loop. If x is the second term a2 of the list, two more comparisons are needed, so that a total of five comparisons are used. In general, if x is the ith term of the list ai, two comparisons will be used at each of the i steps of the loop, and one outside the loop, so that a total of 2i + 1 comparisons are needed. Hence, the average number of comparisons used equals 3 + 5 + 7 + ⋯ + (2n + 1) 2(1 + 2 + 3 + ⋯ + n) + n = . n n

Using the formula from line 2 of Table 2 in Section 2.4 (and see Exercise 37(b) of Section 2.4), n(n + 1) 1 + 2 + 3 + ⋯ + n = . 2 Hence, the average number of comparisons used by the linear search algorithm (when x is known to be in the list) is 2[n(n + 1)∕2] + 1 = n + 2, n

which is Θ(n). ◂

Remark: In the analysis in Example 4 we assumed that x is in the list being searched. It is also possible to do an average-case analysis of this algorithm when x may not be in the list (see Exercise 23).

Remark: Although we have counted the comparisons needed to determine whether we have reached the end of a loop, these comparisons are often not counted. From this point on we will ignore such comparisons.

WORST-CASE COMPLEXITY OF TWO SORTING ALGORITHMS We analyze the

worst-case complexity of the bubble sort and the insertion sort in Examples 5 and 6.

EXAMPLE 5 What is the worst-case complexity of the bubble sort in terms of the number of comparisons made?

Solution: The bubble sort described before Example 4 in Section 3.1 sorts a list by performing a sequence of passes through the list. During each pass the bubble sort successively compares adjacent elements, interchanging them if necessary. When the ith pass begins, the i − 1 largest elements are guaranteed to be in the correct positions. During this pass, n − i comparisons are used. Consequently, the total number of comparisons used by the bubble sort to order a list of n elements is

(n − 1)n (n − 1) + (n − 2) + ⋯ + 2 + 1 = 2

using a summation formula from line 2 in Table 2 in Section 2.4 (and Exercise 37(b) in Section 2.4). Note that the bubble sort always uses this many comparisons, because it con- tinues even if the list becomes completely sorted at some intermediate step. Consequently, the bubble sort uses (n − 1)n∕2 comparisons, so it has Θ(n2 ) worst-case complexity in terms of the number of comparisons used. ◂

EXAMPLE 6 What is the worst-case complexity of the insertion sort in terms of the number of comparisons made?

Solution: The insertion sort (described in Section 3.1) inserts the jth element into the correct position among the first j − 1 elements that have already been put into the correct order. It does this by using a linear search technique, successively comparing the jth element with successive terms until a term that is greater than or equal to it is found or it compares aj with itself and stops because aj is not less than itself. Consequently, in the worst case, j comparisons are required to insert the jth element into the correct position. Therefore, the total number of comparisons used by the insertion sort to sort a list of n elements is

n(n + 1) 2 + 3 + ⋯ + n = − 1, 2

using the summation formula for the sum of consecutive integers in line 2 of Table 2 of Section 2.4 (and see Exercise 37(b) of Section 2.4), and noting that the first term, 1, is missing in this sum. Note that the insertion sort may use considerably fewer comparisons if the smaller elements started out at the end of the list. We conclude that the insertion sort has worst-case

complexity Θ(n2 ). ◂

In Examples 5 and 6 we showed that both the bubble sort and the insertion sort have worst-case time complexity Θ(n2 ). However, the most efficient sorting algorithms can sort n

items in O(n log n) time, as we will show in Sections 8.3 and 11.1 using techniques we de- velop in those sections. From this point on, we will assume that sorting n items can be done in O(n log n) time. You can run animations found on many different websites that simultaneously run different Links sorting algorithms on the same lists. Doing so will help you gain insights into the efficiency of different sorting algorithms. Among the sorting algorithms that you can find are the bubble sort, the insertion sort, the shell sort, the merge sort, and the quick sort. Some of these anima- tions allow you to test the relative performance of these sorting algorithms on lists of randomly selected items, lists that are nearly sorted, and lists that are in reversed order.

Complexity of Matrix Multiplication

The definition of the product of two matrices can be expressed as an algorithm for computing the product of two matrices. Suppose that C = [cij] is the m × n matrix that is the product of the m × k matrix A = [aij] and the k × n matrix B = [bij]. The algorithm based on the definition of the matrix product is expressed in pseudocode in Algorithm 1.

ALGORITHM 1 Matrix Multiplication.

procedure matrix multiplication(A, B: matrices)for i := 1 to m for j := 1 to n cij := 0 for q := 1 to k

cij := cij + aiqbqj return C {C = [cij] is the product of A and B}

We can determine the complexity of this algorithm in terms of the number of additions and multiplications used.

EXAMPLE 7 How many additions of integers and multiplications of integers are used by Algorithm 1 to multiply two n × n matrices with integer entries? Solution: There are n2 entries in the product of A and B. To find each entry requires a total of n 3 2 multiplications and n − 1 additions. Hence, a total of n multiplications and n (n − 1) additions are used. ◂

Surprisingly, there are more efficient algorithms for matrix multiplication than that given in Algorithm 1. As Example 7 shows, multiplying two n × n matrices directly from the definition requires O(n3 ) multiplications and additions. Using other algorithms, two n × n matrices can √ be multiplied using O(n 7) multiplications and additions. (Details of such algorithms can be found in [CoLeRiSt09].) We can also analyze the complexity of the algorithm we described in Chapter 2 for com- puting the Boolean product of two matrices, which we display as Algorithm 2.

ALGORITHM 2 The Boolean Product of Zero–One Matrices. procedure Boolean product of Zero–One Matrices (A, B: zero–one matrices)for i := 1 to m for j := 1 to n cij := 0 for q := 1 to k cij := cij ∨ (aiq ∧ bqj) return C {C = [cij] is the Boolean product of A and B}

The number of bit operations used to find the Boolean product of two n × n matrices can be easily determined. EXAMPLE 8 How many bit operations are used to find A ⊙ B, where A and B are n × n zero–one matrices?

Solution: There are n2 entries in A ⊙ B. Using Algorithm 2, a total of n ORs and n ANDs are used to find an entry of A ⊙ B. Hence, 2n bit operations are used to find each entry. Therefore, 3 2n bit operations are required to compute A ⊙ B using Algorithm 2. ◂

Understanding the Complexity of Algorithms

Table 1 displays some common terminology used to describe the time complexity of algorithms. For example, an algorithm that finds the largest of the first 100 terms of a list of n elements by applying Algorithm 1 to the sequence of the first 100 terms, where n is an integer with ≥n 100, has constant complexity because it uses 99 comparisons no matter what n is (as the reader can verify). The linear search algorithm has linear (worst-case or average-case) complexity and the binary search algorithm has logarithmic (worst-case) complexity. Many important algorithms have n log n, or linearithmic (worst-case) complexity, such as the merge sort, which we will introduce in Chapter 4. (The word linearithmic is a combination of the words linear and logarithmic.)

TABLE 1 Commonly Used Terminology for the Complexity of Algorithms. Complexity Terminology Θ(1) Constant complexity Θ(log n) Logarithmic complexity Θ(n) Linear complexity Θ(n log n) Linearithmic complexity Θ(nb) Polynomial complexity Θ(bn), where b > 1 Exponential complexity Θ(n!) Factorial complexity Discrete Mathematics Department of CSE UNIT-IV

DISCRETE PROBABILITY THEORY AND ADVANCED COUNTING TECHNIQUES

Basis of counting:

If X is a set, let us use |X| to denote the number of elements in X.

Two Basic Counting Principles

Two elementary principles act as “building blocks” for all counting problems. The first principle says that the whole is the sum of its parts; it is at once immediate and elementary.

Sum Rule: The principle of disjunctive counting :

If a set X is the union of disjoint nonempty subsets S1, ….., Sn, then | X | = | S1 | + | S2 | + ….. + | Sn |.

We emphasize that the subsets S1, S2, …., Sn must have no elements in common. Moreover, since X = S1 U S2 U ……U Sn, each element of X is in exactly one of the subsets Si. In other words, S1, S2, …., Sn is a partition of X.

If the subsets S1, S2, …., Sn were allowed to overlap, then a more profound principle will be needed- -the principle of inclusion and exclusion. Frequently, instead of asking for the number of elements in a set perse, some problems ask for how many ways a certain event can happen. The difference is largely in semantics, for if A is an event, we can let X be the set of ways that A can happen and count the number of elements in X. Nevertheless, let us state the sum rule for counting events. If E1, ……, En are mutually exclusive events, and E1 can happen e1 ways, E2 happen e2 ways,…. ,En can happen en ways, E1 or E2 or …. or En can happen e1 + e2 + …….. + en ways. Again we emphasize that mutually exclusive events E1 and E2 mean that E1 or E2 can happen but both cannot happen simultaneously. The sum rule can also be formulated in terms of choices: If an object can be selected from a reservoir in e1 ways and an object can be selected from a separate reservoir in e2 ways and an object can be selected from a separate reservoir in e2 ways, then the selection of one object from either one reservoir or the other can be made in e1 + e2 ways.

St.peters Engineering college 1

Discrete Mathematics Department of CSE Product Rule: The principle of sequencing counting If S1, ….., Sn are nonempty sets, then the number of elements in the Cartesian product S1 x S2 x ….. x Sn is the product ∏in=1 |S i |. That is, | S1 x S2 x ...... x Sn | = ∏in=1| S i |.

Observe that there are 5 branches in the first stage corresponding to the 5 elements of S1 and to each of these branches there are 3 branches in the second stage corresponding to the 3 elements of S2 giving a total of 15 branches altogether. Moreover, the Cartesian product S1 x S2 can be partitioned as (a1 x S2) U (a2 x S2) U (a3 x S2) U (a4 x S2) U (a5 x S2), where (ai x S2) = {( ai, b1), ( ai i, b2), ( ai, b3)}. Thus, for example, (a3 x S2) corresponds to the third branch in the first stage followed by each of the 3 branches in the second stage.

More generally, if a1,….., an are the n distinct elements of S1 and b1,….,bm are the m distinct elements of S2, then S1 x S2 = Uin =1 (ai x S2). For if x is an arbitrary element of S1 x S2 , then x = (a, b) where a Î S1 and b Î S2. Thus, a = ai for some i and b = bj for some j. Thus, x = (ai, bj) Î(ai x S2) and therefore x Î Uni =1(ai x S2). Conversely, if x Î Uin =1(ai x S2), then x Î (ai x S2) for some i and thus x = (ai, bj) where bj is some element of S2. Therefore, x Î S1 x S2.

Next observe that (ai x S2) and (aj x S2) are disjoint if i ≠ j since if x Î (ai x S2) ∩ (aj x S2) then x = ( ai, bk) for some k and x = (aj, b1) for some l. But then (ai, bk) = (aj, bl) implies that ai = aj and bk = bl. But since i ≠ j , ai ≠ a j. Thus, we conclude that S1 x S2 is the disjoint union of the sets (ai x S2). Furthermore |ai x S2| = |S2| since there is obviously a one-to-one correspondence between the sets ai x S2 and S2, namely, (ai, bj) → bj.

Then by the sum rule |S1 x S2| = ∑nni=1 | ai x S2| = (n summands) |S2| + |S2| +…….+ |S2| = n |S2| = nm. Therefore, we have proven the product rule for two sets. The general rule follows by mathematical

St.peters Engineering college 2

Discrete Mathematics Department of CSE induction. We can reformulate the product rule in terms of events. If events E1, E2 , …., En can happen e1, e2,…., and en ways, respectively, then the sequence of events E1 first, followed by E2,…., followed by En can happen e1e2 …en ways. In terms of choices, the product rule is stated thus: If a first object can be chosen e1 ways, a second e2 ways , …, and an nth object can be made in e1e2….en ways.

Combinations & Permutations:

Definition.

A combination of n objects taken r at a time (called an r-combination of n objects) is an unordered selection of r of the objects. A permutation of n objects taken r at a time (also called an r-permutation of n objects) is an ordered selection or arrangement of r of the objects. Note that we are simply defining the terms r-combinations and r-permutations here and have not mentioned anything about the properties of the n objects. For example, these definitions say nothing about whether or not a given element may appear more than once in the list of n objects. In other words, it may be that the n objects do not constitute a set in the normal usage of the word.

SOLVED PROBLEMS

Example1. Suppose that the 5 objects from which selections are to be made are: a, a, a, b, c. then the 3- combinations of these 5 objects are : aaa, aab, aac, abc. The permutations are:

aaa, aab, aba, baa, aac, aca, caa, abc, acb, bac, bca, cab, cba. Neither do these definitions say anything about any rules governing the selection of the r-objects: on one extreme, objects could be chosen where all repetition is forbidden, or on the other extreme, each object may be chosen up to t times, or then again may be some rule of selection between these extremes; for instance, the rule that would allow a given object to be repeated up to a certain specified number of times. We will use expressions like {3 . a , 2. b ,5.c} to indicate either (1) that we have 3 + 2 + 5 =10 objects including 3a’s , 2b’s and 5c’s, or (2) that we have 3 objects a, b, c,

St.peters Engineering college 3

Discrete Mathematics Department of CSE where selections are constrained by the conditions that a can be selected at most three times, b can be selected at most twice, and c can be chosen up to five times. The numbers 3, 2 and 5 in this example will be called repetition numbers.

Example 2 The 3-combinations of {3. a, 2. b, 5. c} are:

aaa, aab, aac, abb, abc, ccc, ccb, cca, cbb.

Example 3. The 3-combinations of {3 . a, 2. b, 2. c , 1. d} are:

aaa, aab, aac, aad, bba, bbc, bbd, cca, ccb, ccd, abc, abd, acd, bcd.

In order to include the case where there is no limit on the number of times an object can be repeated in a selection (except that imposed by the size of the selection) we use the symbol ∞ as a repetition number to mean that an object can occur an infinite number of times.

Example 4. The 3-combinations of {∞. a, 2.b, ∞.c} are the same as in Example 2 even though a and c can be repeated an infinite number of times. This is because, in 3-combinations, 3 is the limit on the number of objects to be chosen.

If we are considering selections where each object has ∞ as its repetition number then we designate such selections as selections with unlimited repetitions. In particular, a selection of r objects in this case will be called r-combinations with unlimited repetitions and any ordered arrangement of these r objects will be an r-permutation with unlimited repetitions.

Example5 The combinations of a ,b, c, d with unlimited repetitions are the 3-combinations of {∞ . a , ∞. b, ∞. c, ∞. d}. These are 20 such 3-combinations, namely: aaa, aab, aac, aad, bbb, bba, bbc, bbd, ccc, cca, ccb, ccd, ddd, dda, ddb, ddc, abc, abd, acd, bcd. Moreover, there are 43 = 64 of 3-permutations with unlimited repetitions since the first position can be filled 4 ways (with a, b, c, or d), the second position can be filled 4 ways, and likewise for the third position.

The 2-permutations of {∞. a, ∞. b, ∞. c, ∞. d} do not present such a formidable list and so we tabulate them in the following table.

2-permutations 2-combinations with Unlimited Repetitions with Unlimited

St.peters Engineering college 4

Discrete Mathematics Department of CSE Repetitions aa aa ab ab, ba ac ac, ca ad ad, da bb bb bc bc, cb bd bd, db cc cc cd cd, dc dd dd 10 16

Of course, these are not the only constraints that can be placed on selections; the possibilities are endless. We list some more examples just for concreteness. We might, for example, consider selections of {∞.a, ∞. b, ∞. c} where b can be chosen only even number of times. Thus, 5-combinations with these repetition numbers and this constraint would be those 5-combinations with unlimited repetitions and where b is chosen 0, 2, or 4 times.

Example6 The 3-combinations of {∞ .a, ∞ .b, 1 .c,1 .d} where b can be chosen only an even number of times are the 3-combinations of a, b, c, d where a can be chosen up 3 times, b can be chosen 0 or 2 times, and c and d can be chosen at most once. The 3-cimbinations subject to these constraints are: aaa, aac, aad, bbc, bbd, acd.

As another example, we might be interested in, selections of {∞.a, 3.b, 1.c} where a can be chosen a prime number of times. Thus, the 8-combinations subject to these constraints would be all those 8- combinations where a can be chosen 2, 3, 5, or 7 times, b can chosen up to 3 times, and c can be chosen at most once. There are, as we have said, an infinite variety of constraints one could place on selections. You can just let your imagination go free in conjuring up different constraints on the selection, would constitute an r- combination according to our definition. Moreover, any arrangement of these r objects would constitute an r- permutation.

St.peters Engineering college 5

Discrete Mathematics Department of CSE While there may be an infinite variety of constraints, we are primarily interested in two major types: one we have already described—combinations and permutations with unlimited repetitions, the other we now describe. If the repetition numbers are all 1, then selections of r objects are called r-combinations without repetitions and arrangements of the r objects are r-permutations without repetitions. We remind you that r- combinations without repetitions are just subsets of the n elements containing exactly r elements. Moreover, we shall often drop the repetition number 1 when considering r-combinations without repetitions. For example, when considering r-combinations of {a, b, c, d} we will mean that each repetition number is 1 unless otherwise designated, and, of course, we mean that in a given selection an element need not be chosen at all, but, if it is chosen, then in this selection this element cannot be chosen again.

Example7. Suppose selections are to be made from the four objects a, b, c, d.

2-combinations 2-Permutations without Repetitions without Repetitions ab ab, ba ac ac, ca ad ad, da bc bc, cb bd bd, db cd cd, dc

6 12

There are six 2-combinations without repetitions and to each there are two 2-permutations giving a total of twelve 2-permutations without repetitions. Note that total number of 2-combinations with unlimited repetitions in Example 5 included six 2- combinations without repetitions of Example.7 and as well 4 other 2-combinations where repetitions actually occur. Likewise, the sixteen 2-permutations with unlimited repetitions included the twelve 2-permutations without repetitions.

St.peters Engineering college 6

Discrete Mathematics Department of CSE 3-combinations 3-Permutations without Repetitions without Repetitions abc abc, acb, bac, bca, cab, cba

abd abd, adb, bad, bda, dab, dba acd acd, adc, cad, cda, dac, dca bcd bcd, bdc, cbd, cdb, dbc, dcb

4 24

Note that to each of the 3-combinations without repetitions there are 6 possible 3- permutations without repetitions. Momentarily, we will show that this observation can be generalized.

Combinations And Permutations With Repetitions:

General formulas for enumerating combinations and permutations will now be presented. At this time, we will only list formulas for combinations and permutations without repetitions or with unlimited repetitions. We will wait until later to use generating functions to give general techniques for enumerating combinations where other rules govern the selections. Let P (n, r) denote the number of r-permutations of n elements without repetitions.

Theorem 5.3.1.( Enumerating r-permutations without repetitions).

P(n, r) = n(n-1)……. (n – r + 1) = n! / (n-r)! Proof. Since there are n distinct objects, the first position of an r-permutation may be filled in n ways. This done, the second position can be filled in n-1 ways since no repetitions are allowed and there are n – 1 objects left to choose from. The third can be filled in n-2 ways. By applying the product rule, we conduct that

P (n, r) = n(n-1)(n-2)……. (n – r + 1).

From the definition of factorials, it follows that

P (n, r) = n! / (n-r)!

St.peters Engineering college 7

Discrete Mathematics Department of CSE When r = n, this formula becomes P (n, n) = n! / 0! = n! When we explicit reference to r is not made, we assume that all the objects are to be arranged; thus we talk about the permutations of n objects we mean the case r=n.

Corollary 1. There are n! permutations of n distinct objects.

Example 1. There are 3! = 6 permutations of {a, b, c}. There are 4! = 24 permutations of (a, b, c, d). The number of 2-permutations {a, b, c, d, e} is P(5, 2) = 5! / (5 - 2)! = 5 x 4 = 20. The number of 5-letter words using the letters a, b, c, d, and e at most once is P (5, 5) = 120.

Example 2 There are P (10, 4) = 5,040 4-digit numbers that contain no repeated digits since each such number is just an arrangement of four of the digits 0, 1, 2, 3 , …., 9 (leading zeroes are allowed). There are P (26, 3) P(10, 4) license plates formed by 3 distinct letters followed by 4 distinct digits.

Example3. In how many ways can 7 women and 3 men be arranged in a row if the 3 men must always stand next to each other?

There are 3! ways of arranging the 3 men. Since the 3 men always stand next to each other, we treat them as a single entity, which we denote by X. Then if W1, W2, ….., W7 represents the women, we next are interested in the number of ways of arranging {X, W1, W2, W3,……., W7}. There are 8! permutations these 8 objects. Hence there are (3!) (8!) permutations altogether. (of course, if there has to be a prescribed order of an arrangement on the 3 men then there are only 8! total permutations).

Example4. In how many ways can the letters of the English alphabet be arranged so that there are exactly 5 letters between the letters a and b?

There are P (24, 5) ways to arrange the 5 letters between a and b, 2 ways to place a and b, and then 20! ways to arrange any 7-letter word treated as one unit along with the remaining 19 letters. The total is P (24, 5) (20!) (2). permutations for the objects are being arranged in a line. If instead of arranging objects in a line, we arrange them in a circle, then the number of permutations decreases.

Example 5. In how many ways can 5 children arrange themselves in a ring?

St.peters Engineering college 8

Discrete Mathematics Department of CSE Solution. Here, the 5 children are not assigned to particular places but are only arranged relative to one another. Thus, the arrangements (see Figure 2-3) are considered the same if the children are in the same order clockwise. Hence, the position of child C1 is immaterial and it is only the position of the 4 other children relative to C1 that counts. Therefore, keeping C1 fixed in position, there are 4! arrangements of the remaining children.

Binomial Coefficients:

In mathematics, the binomial coefficient is the coefficient of the x k term in the polynomial expansion of the binomial power (1 + x) n.

In combinatorics, is interpreted as the number of k-element subsets (the k-combinations) of an n-element set, that is the number of ways that k things can be "chosen" from a set of n things. Hence, is often read as "n choose k" and is called the choose function of n and k. The notation was introduced by Andreas von Ettingshausen in 182, although the numbers were already known centuries before that (see Pascal's triangle). n Alternative notations include C(n, k), nCk, Ck, , in all of which the C stands for combinations or choices.

For natural numbers (taken to include 0) n and k, the binomial coefficient can be defined as the coefficient of the monomial Xk in the expansion of (1 + X)n. The same coefficient also occurs (if k ≤ n) in the binomial formula

(valid for any elements x,y of a commutative ring), which explains the name "binomial coefficient".

Another occurrence of this number is in combinatorics, where it gives the number of ways, disregarding order, that a k objects can be chosen from among n objects; more formally, the number of k-element subsets (or k-combinations) of an n-element set. This number can be seen to be equal to the one of the first definition, independently of any of the formulas below to compute it: if in each of the n factors of the power (1 + X)n one temporarily labels the term X with an index i (running from 1 to n), then each subset of k indices gives after expansion a contribution Xk, and the coefficient of that monomial in the result will be the number of such subsets. This shows in particular that is a natural number for any natural numbers n and k. There are many other combinatorial interpretations of binomial coefficients (counting problems for which the answer is given by a binomial coefficient expression), for instance the number of words formed of n bits (digits 0 or 1) whose sum is k, but most of these are easily seen to be equivalent to counting k-combinations.

Several methods exist to compute the value of without actually expanding a binomial power or counting k-combinations.

St.peters Engineering college 9

Discrete Mathematics Department of CSE Binomial Multinomial theorems: Binomial theorem: In elementary algebra, the binomial theorem describes the algebraic expansion of powers of a binomial. According to the theorem, it is possible to expand the power (x + y)n into a sum involving terms of the form axbyc, where the coefficient of each term is a positive integer, and the sum of the exponents of x and y in each term is n. For example,

The coefficients appearing in the binomial expansion are known as binomial coefficients. They are the same as the entries of Pascal's triangle, and can be determined by a simple formula involving factorials. These numbers also arise in combinatorics, where the coefficient of xn−kyk is equal to the number of different combinations of k elements that can be chosen from an n-element set.

According to the theorem, it is possible to expand any power of x + y into a sum of the form

where denotes the corresponding binomial coefficient. Using summation notation, the formula above can be written

This formula is sometimes referred to as the Binomial Formula or the Binomial Identity.

A variant of the binomial formula is obtained by substituting 1 for x and x for y, so that it involves only a single variable. In this form, the formula reads

or equivalently

EXAMPLE

Simplify (x+v(x2-1)) + (x- v(x2-1))6

St.peters Engineering college 10

Discrete Mathematics Department of CSE Solution: let vx2-1 = a, so we have:

(x=a)6 + (x-a)6

= [x6+6C1x5.a+6C2.x4.a2 + 6C3x3a3 + 6C4x2a4 + 6C5xa5 +6C6a6]

+ [x6-6C1x5a+6C2.x4.a2 – 6C3x3a3 + 6C4x2a4 – 6C5xa5 +6C6a6]

= 2[x6+6C2x4a2+6C4x2a4+6C6a6]

= 2[x6+15x4(x2-1)+15x2(x2-1)2+(x2-1)3]

= 2[x6+15x6-15x4+15x6+15x2-30x4+x6-1-3x4+3x3]

= 2[32x6-48x4+18x2-1]

Multinomial theorem:

In mathematics, the multinomial theorem says how to write a power of a sum in terms of powers of the terms in that sum. It is the generalization of the binomial theorem to polynomials.

For any positive integer m and any nonnegative integer n, the multinomial formula tells us how a polynomial expands when raised to an arbitrary power:

The summation is taken over all sequences of nonnegative integer indices k1 through km such the sum of all ki is n. That is, for each term in the expansion, the exponents must add up to n. Also, as with the binomial theorem, quantities of the form x0 that appear are taken to equal 1 (even when x equals zero). Alternatively, this can be written concisely using multiindices as

α α α α where α = (α1,α2,…,αm) and x = x1 1x2 2⋯xm m.

Example

(a + b + c)3 = a3 + b3 + c3 + 3a2b + 3a2c + 3b2a + 3b2c + 3c2a + 3c2b + 6abc.

We could have calculated each coefficient by first expanding (a + b + c)2 = a2 + b2 + c2 + 2ab + 2bc + 2ac, then self-multiplying it again to get (a + b + c)3 (and then if we were raising it to higher powers, we'd multiply it by itself even some more). However this process is slow, and can be avoided by using the multinomial theorem. The multinomial theorem "solves" this process by giving us the closed form for any

St.peters Engineering college 11

Discrete Mathematics Department of CSE coefficient we might want. It is possible to "read off" the multinomial coefficients from the terms by using the multinomial coefficient formula. For example:

a2b0c1 has the coefficient

a1b1c1 has the coefficient .

We could have also had a 'd' variable, or even more variables—hence the multinomial theorem.

The principles of Inclusion – Exclusion:

Let denote the cardinality of set , then it follows immediately that

(1) where denotes union, and denotes intersection. The more general statement

(2)

also holds, and is known as Boole's inequality.

This formula can be generalized in the following beautiful manner. Let be a p-system of consisting of sets , ..., , then

3

where the sums are taken over k-subsets of . This formula holds for infinite sets as well as finite sets.

The principle of inclusion-exclusion was used by Nicholas Bernoulli to solve the recontres problem of finding the number of derangements.

For example, for the three subsets , , and of , the following table summarizes the terms appearing the sum.

# term set length 1 2, 3, 7, 9, 10 5

1, 2, 3, 9 4

St.peters Engineering college 12

Discrete Mathematics Department of CSE 2, 4, 9, 10 4

2 2, 3, 9 3

2, 9, 10 3

2, 9 2

3 2, 9 2

is therefore equal to , corresponding to the seven elements .

Pigeon hole principles and its application:

The statement of the Pigeonhole Principle:

If m pigeons are put into m pigeonholes, there is an empty hole iff there's a hole with more than one pigeon.

If n > m pigeons are put into m pigeonholes, there's a hole with more than one pigeon.

Example:

Consider a chess board with two of the diagonally opposite corners removed. Is it possible to cover the board with pieces of domino whose size is exactly two board squares?

Solution

No, it's not possible. Two diagonally opposite squares on a chess board are of the same color. Therefore, when these are removed, the number of squares of one color exceeds by 2 the number of squares of another color. However, every piece of domino covers exactly two squares and these are of different colors. Every placement of domino pieces establishes a 1-1 correspondence between the set of white squares and the set of black squares. If the two sets have different number of elements, then, by the Pigeonhole Principle, no 1-1 correspondence between the two sets is possible.

Generalizations of the pigeonhole principle

A generalized version of this principle states that, if n discrete objects are to be allocated to m containers, then at least one container must hold no fewer than objects, where is the ceiling function, denoting the smallest integer larger than or equal to x. Similarly, at least one container must hold no more than objects, where is the floor function, denoting the largest integer smaller than or equal to x.

St.peters Engineering college 13

Discrete Mathematics Department of CSE A probabilistic generalization of the pigeonhole principle states that if n pigeons are randomly put into m pigeonholes with uniform probability 1/m, then at least one pigeonhole will hold more than one pigeon with probability

where (m)n is the falling factorial m(m − 1)(m − 2)...(m − n + 1). For n = 0 and for n = 1 (and m > 0), that probability is zero; in other words, if there is just one pigeon, there cannot be a conflict. For n > m (more pigeons than pigeonholes) it is one, in which case it coincides with the ordinary pigeonhole principle. But even if the number of pigeons does not exceed the number of pigeonholes (n ≤ m), due to the random nature of the assignment of pigeons to pigeonholes there is often a substantial chance that clashes will occur. For example, if 2 pigeons are randomly assigned to 4 pigeonholes, there is a 25% chance that at least one pigeonhole will hold more than one pigeon; for 5 pigeons and 10 holes, that probability is 69.76%; and for 10 pigeons and 20 holes it is about 93.45%. If the number of holes stays fixed, there is always a greater probability of a pair when you add more pigeons. This problem is treated at much greater length at birthday paradox.

A further probabilistic generalisation is that when a real-valued random variable X has a finite mean E(X), then the probability is nonzero that X is greater than or equal to E(X), and similarly the probability is nonzero that X is less than or equal to E(X). To see that this implies the standard pigeonhole principle, take any fixed arrangement of n pigeons into m holes and let X be the number of pigeons in a hole chosen uniformly at random. The mean of X is n/m, so if there are more pigeons than holes the mean is greater than one. Therefore, X is sometimes at least 2.

Applications:

The pigeonhole principle arises in computer science. For example, collisions are inevitable in a hash table because the number of possible keys exceeds the number of indices in the array. No hashing algorithm, no matter how clever, can avoid these collisions. This principle also proves that any general-purpose lossless compression algorithm that makes at least one input file smaller will make some other input file larger. (Otherwise, two files would be compressed to the same smaller file and restoring them would be ambiguous.)

A notable problem in mathematical analysis is, for a fixed irrational number a, to show that the set {[na]: n is an integer} of fractional parts is dense in [0, 1]. After a moment's thought, one finds that it is not easy to explicitly find integers n, m such that |na − m| < e, where e > 0 is a small positive number and a is some arbitrary irrational number. But if one takes M such that 1/M < e, by the pigeonhole principle there must be n1, n2 ∈ {1, 2, ..., M + 1} such that n1a and n2a are in the same integer subdivision of size 1/M (there are only M such subdivisions between consecutive integers). In particular, we can find n1, n2 such that n1a is in (p + k/M, p + (k + 1)/M), and n2a is in (q + k/M, q + (k + 1)/M), for some p, q integers and k in {0, 1, ..., M − 1}. We can then easily verify that (n2 − n1)a is in (q − p − 1/M, q − p + 1/M). This implies that [na] < 1/M < e, where n = n2 − n1 or n = n1 − n2. This shows that 0 is a limit point of {[na]}. We can then use this fact to prove the case for p in (0, 1]: find n such that [na] < 1/M < e; then if p ∈ (0, 1/M], we are done. Otherwise p in (j/M, (j + 1)/M], and by setting k = sup{r ∈ N : r[na] < j/M}, one obtains |[(k + 1)na] − p| < 1/M < e.

St.peters Engineering college 14

Discrete Mathematics Department of CSE

Recurrence Relation

Generating Functions:

In mathematics, a generating function is a formal power series in one indeterminate, whose coefficients encode information about a sequence of numbers an that is indexed by the natural numbers. Generating functions were first introduced by Abraham de Moivre in 1730, in order to solve the general linear recurrence problem. One can generalize to formal power series in more than one indeterminate, to encode information about arrays of numbers indexed by several natural numbers.

Generating functions are not functions in the formal sense of a mapping from a domain to a codomain; the name is merely traditional, and they are sometimes more correctly called generating series.

Ordinary generating function

The ordinary generating function of a sequence an is

When the term generating function is used without qualification, it is usually taken to mean an ordinary generating function.

If an is the probability mass function of a discrete random variable, then its ordinary generating function is called a probability-generating function.

The ordinary generating function can be generalized to arrays with multiple indices. For example, the ordinary generating function of a two-dimensional array am, n (where n and m are natural numbers) is

Example:

Exponential generating function

St.peters Engineering college 15

Discrete Mathematics Department of CSE The exponential generating function of a sequence an is

Example:

Function of Sequences: Generating functions giving the first few powers of the nonnegative integers are given in the following table. series

1

There are many beautiful generating functions for special functions in number theory. A few particularly nice examples are (2)

(3)

(4)

for the partition function P, where is a q-Pochhammer symbol, and

(5)

(6)

(7)

for the Fibonacci numbers .

St.peters Engineering college 16

Discrete Mathematics Department of CSE Generating functions are very useful in combinatorial enumeration problems. For example, the subset sum problem, which asks the number of ways to select out of given integers such that their sum equals , can be solved using generating functions.

Calculating Coefficient of generating function:

By using the following polynomial expansions, we can calculate the coefficient of a generating function.

Polynomial Expansions: m+1 1− x 2 m 1) =1... +x + xx + + 1− x 1 =1... +xx ++ 2 2) 1− x 3) (1+xCn )nrn = 1 +++ x ( Cn ,1)( ++ ,2) xCnrxCnnx + ...2 ( , ) ... ( , )

4) (1)−=xCnm 1( −++ nmmkkmnnm ,1)( xCn + ,2)... −+ xCnkxCnnx +( 1) − ( , )...2 ( 1) ( , ) 1 =1 ++C (11,1)(21,2)... −++ nx −+ CnxC ++ r (1, −+ n )... r x 2 r 5) (1− x ) n

2 2 6) If h(x)=f(x)g(x), where f(x) = a 0 + a 1 x + a 2 x + ... and g(x) = b 0 + b 1 x + b 2 x + ..., then

=ab ++++() ab () abx ++ ... ++++ (...) ab ++ ab ... abxab2 ab a babx r h(x) 00 10 0120 11 020 11 220 rrrr −−

St.peters Engineering college 17

Discrete Mathematics Department of CSE

Recurrence relations:

Introduction :A recurrence relation is a formula that relates for any integer n ≥ 1, the n-th term of a sequence A = {ar}∞r=0 to one or more of the terms a0,a1,….,an-1. Example. If Sn denotes the sum of the first n positive integers, then

(1) Sn = n + Sn-1. Similarly if d is a real number, then the nth term of an arithmetic progression with common difference d satisfies the relation

(2) an = an -1 + d. Likewise if pn denotes the nth term of a geometric progression with common ratio r, then

(3) pn = r pn – 1. We list other examples as: (4) an – 3an-1 + 2an-2 = 0. St.peters Engineering college 18

Discrete Mathematics Department of CSE (5) an – 3 an-1+ 2 an-2 = n2 + 1. (6) an – (n - 1) an-1 - (n - 1) an-2 = 0. (7) an – 9 an-1+ 26 an-2 – 24 an-3 = 5n. (8) an – 3(an-1)2 + 2 an-2 = n. (9) an = a0 an-1+ a1 an-2+ … + an-1a0. (10) a2n + (an-1)2 = -1.

Definition. Suppose n and k are nonnegative integers. A recurrence relation of the form c0(n)an + c1(n)an-1 + …. + ck(n)an-k = f(n) for n ≥ k, where c0(n), c1(n),…., ck(n), and f(n) are functions of n is said to be a linear recurrence relation. If c0(n) and ck(n) are not identically zero, then it is said to be a linear recurrence relation degree k. If c0(n), c1(n),…., ck(n) are constants, then the recurrence relation is known as a linear relation with constant coefficients. If f(n) is identically zero, then the recurrence relation is said to be homogeneous; otherwise, it is inhomogeneous.

Thus, all the examples above are linear recurrence relations except (8), (9), and (10); the relation (8), for instance, is not linear because of the squared term. The relations in (3), (4) , (5), and (7) are linear with constant coefficients. Relations (1), (2), and (3) have degree 1; (4), (5), and (6) have degree 2; (7) has degree 3. Relations (3) , (4), and (6) are homogeneous. There are no general techniques that will enable one to solve all recurrence relations. There are, nevertheless, techniques that will enable us to solve linear recurrence relations with constant coefficients.

SOLVING RECURRENCE RELATIONS BY SUSTITUTION AND GENERATING FUNCTIONS We shall consider four methods of solving recurrence relations in this and the next two sections: 1. Substitution (also called iteration), 2. Generating functions, 3. Characteristics roots, and 4. Undetermined coefficients. In the substitution method the recurrence relation for an is used repeatedly to solve for a general expression for an in terms of n. We desire that this expression involve no other terms of the sequence except those given by boundary conditions. The mechanics of this method are best described in terms of examples. We used this method in Example5.3.4. Let us also illustrate the method in the following examples.

Example

Solve the recurrence relation an = a n-1 + f(n) for n ³1 by substitution

St.peters Engineering college 19

Discrete Mathematics Department of CSE a1= a0 + f(1) a2 = a1 + f(2) = a0 + f(1) + f(2))

a3 = a2 + f(3)= a0 + f(1) + f(2) + f(3) . .

. an = a0 + f(1) + f(2) +….+ f(n) n = a0 + ∑ f(k) K = 1 Thus, an is just the sum of the f(k) ‘s plus a0.

More generally, if c is a constant then we can solve an = c a n-1 + f(n) for n ³1 in the same way: a1 = c a0 + f(1) a2 = c a1 + f(2) = c (c a0 + f(1)) + f(2) = c2 a0 + c f(1) + f(2) a3= c a2 + f(3) = c(c 2 a0 + c f(1) + f(2)) + f(3) =c3 a0 + c2 f(1) + c f(2) + f(3) . . . an = c a n-1 + f(n) = c(c n-1 a0 + c n-2 f(1) +. . . + c n-2 + f(n-1)) + f(n) =c n a0 + c n-1 f(1) + c n-2 f(2) +. . .+ c f(n-1) + f(n) Or an = c n a0 + ∑c n-k f(k)

Solution of Linear Inhomogeneous Recurrence Relations:

The equation 푎푛+푐1푎푛−1+푐2푎푛−2=(푛), where 푐1and 푐2 are constant, and 푓(푛) is not identically 0, is called a second-order linear inhomogeneous recurrence relation (or difference equation) with constant coefficients. The homogeneous case, which we’ve looked at already, occurs when (푛)≡0. The inhomogeneous case occurs more frequently. The homogeneous case is so important largely because it gives us the key to solving the St.peters Engineering college 20

Discrete Mathematics Department of CSE inhomogeneous equation. If you’ve studied linear differential equations with constant coefficients, you’ll see the parallel. We will call the difference obtained by setting the right-hand side equal to 0, the “associated homogeneous equation.” We know how to solve this. Say that 푉 is a solution. Now suppose that (푛) is any particular solution of the inhomogeneous equation. (That is, it solves the equation, but does not necessarily match the initial data.) Then 푈=푉+(푛) is a solution to the inhomogeneous equation, which you can see simply by substituting 푈 into the equation. On the other hand, every solution 푈 of the inhomogeneous equation is of the form 푈=푉+(푛) where 푉 is a solution of the homogeneous equation, and 푔(푛) is a particular solution of the inhomogeneous equation. The proof of this is straightforward. If we have two solutions to the inhomogeneous equation, say 푈1 and 푈2, then their difference 푈1−푈2=푉 is a solution to the homogeneous equation, which you can check by substitution. But then 푈1=푉+푈2, and we can set 푈2=(푛), since by assumption, 푈2 is a particular solution. This leads to the following theorem: the general solution to the inhomogeneous equation is the general solution to the associated homogeneous equation, plus any particular solution to the inhomogeneous equation. This gives the following procedure for solving the inhomogeneous equation: 1) Solve the associated homogeneous equation by the method we’ve learned. This will involve variable (or undetermined) coefficients. 2) Guess a particular solution to the inhomogeneous equation. It is because of the guess that I’ve called this a procedure, not an algorithm. For simple right-hand sides , we can say how to compute a particular solution, and in these cases, the procedure merits the name “algorithm.” 3) The general solution to the inhomogeneous equation is the sum of the answers from the two steps above. 4) Use the initial data to solve for the undetermined coefficients from step 1.

St.peters Engineering college 21

Discrete Mathematics Department of CSE To solve the equation 푎푛 − 6푎푛−1 + 8푎푛−2 = 3. Let’s suppose that we are also given the initial data 푎0 = 3, 푎1 = 3. The associated homogeneous equation is 푎푛 − 6푎푛−1 + 8푎푛−2 = 0, so the characteristic equation is 푟2 − 6푟 + 8 = 0, which has roots 푟1 = 2 and 푟2 = 4. Thus, the general solution to the associated homogeneous equation is 푐12푛 + 푐24 . When the right-hand side is a polynomial, as in this case, there will always be a particular solution that is a polynomial. Usually, a polynomial of the same degree will work, so we’ll guess in this case that there is a constant 퐶 that solves the homogeneous equation. If that is so, then 푎푛 = 푎푛 −1 = 푎푛 −2 = 퐶, and substituting into the equation gives 퐶 − 6퐶 + 8퐶 = 3, and we find that 퐶 = 1. Now, the general solution to the inhomogeneous equations is 푐12푛 + 푐24푛 + 1. Reassuringly, this is the answer given in the back of the book. Our initial data lead to the equations 푐1 + 푐2 + 1 = 3 and 2푐1 + 4푐2 + 1 = 3, whose solution is 푐1 = 3, 푐2 = −1. Finally, the solution to the inhomogeneous equation, with the initial condition given, is 푎푛 = 3 ∙ 2푛 − 4푛 + 1. Sometimes, a polynomial of the same degree as the right-hand side doesn’t work. This happens when the characteristic equation has 1 as a root. If our equation had been 푎푛 − 6푎푛 −1 + 5푎푛−2 = 3, when we guessed that the particular solution was a constant 퐶, we’d have arrived at the equation 퐶 − 6퐶 + 5퐶 = 3, or 0 = 3. The way to deal with this is to increase the degree of the polynomial. Instead of assuming that the solution is constant, we’ll assume that it’s linear. In fact, we’ll guess that it is of the form 푔 푛 =푛퐶. Then we have 푛퐶−6 푛−1 퐶+5 푛−2 퐶=3, which simplifies to 6퐶−10퐶=3 so that 퐶=−34 . Thus, 푔 푛 = −3푛4 . This won’t be enough if 1 is a root of multiplicity 2, that is, if 푟−1 2 is a factor of the characteristic polynomial. Then there is a particular solution of the form 푔 푛 =퐶푛2. For second-order equations, you never have to go past this. If the right-hand side is a polynomial of degree greater than 0, then the process works juts the same, except that you start with a polynomial of the same degree, increase the degree by 1, if necessary, and then once more, if need be. For example, if the right-hand side were 푓 푛 =2푛−1, we would start by guessing a particular solution 푔 푛 =퐶1푛+퐶2. If it turned out that 1 was a characteristic root, we would amend our guess to 푔 푛 =퐶1푛2+퐶2푛+퐶3. If 1 is a double root, this will fail also, but 푔 푛 =퐶1푛3+퐶2푛2+퐶3푛+퐶4 will work in this case. Another case where there is a simple way of guessing a particular solution is when the right-hand side is an exponential, say 푓 푛 =퐶푛. In that case, we guess that a particular solution is just a constant multiple of 푓, say (푛)=푘퐶푛. Again, we gave trouble when 1 is a characteristic root. We then guess that 푔 푛 =푘푛퐶푛, which will fail only if 1 is a double root. In that case we must use 푔 푛 =푘푛2퐶푛, which is as far as we ever have to go in the second-order case. These same ideas extend to higher-order recurrence relations, but we usually solve them numerically, rather than exactly. A third-order linear difference equation with constant coefficients leads to a cubic characteristic polynomial. There is a formula for the roots of a cubic, but it’s very complicated. For fourth-degree polynomials, there’s also a formula, but it’s even worse. For fifth and higher degrees, no such formula exists. Even for the third-order case, the exact solution of a simple-looking inhomogeneous linear recurrence relation with constant coefficients can take pages to write down. The coefficients will be complicated expressions involving square roots and cube roots. For most, if not all, purposes, a simpler answer with numerical coefficients is better, even though they must in the nature of things, be approximate. The procedure I’ve suggested may strike you as silly. After all, we’ve already solved the characteristic equation, so we know whether 1 is a characteristic root, and what it’s multiplicity is. Why not start with a polynomial of the correct degree? This is all well and good, while you’re taking the course, and remember the procedure in detail. However, if you have to use this procedure some years from now, you probably won’t remember all the details. Then the method I’ve suggested will be valuable. Alternatively, you can start with a general polynomial of the maximum possible degree This leads to a lot of extra work if you’re solving by hand, but it’s the approach I prefer for computer solution.

St.peters Engineering college 22

Discrete Mathematics Department of CSE

UNIT-V Graph Theory

Representation of Graphs:

There are two different sequential representations of a graph. They are

• Adjacency Matrix representation • Path Matrix representation

Adjacency Matrix Representation Suppose G is a simple directed graph with m nodes, and suppose the nodes of G have been ordered and are called v1, v2, . . . , vm. Then the adjacency matrix A = (aij) of the graph G is the m x m matrix defined as follows:

1 if vi is adjacent to Vj, that is, if there is an edge (Vi, Vj) aij = 0 otherwise Suppose G is an undirected graph. Then the adjacency matrix A of G will be a symmetric matrix, i.e., one in which aij = aji; for every i and j.

Drawbacks 1. It may be difficult to insert and delete nodes in G. 2. If the number of edges is 0(m) or 0(m log2 m), then the matrix A will be sparse, hence a great deal of space will be wasted.

Path Matrix Represenation Let G be a simple directed graph with m nodes, v1,v2, . . . ,vm. The path matrix of G is the m-square matrix P = (pij) defined as follows:

1 if there is a path from Vi to Vj Pij = 0 otherwise

St.peters Engineering college 23

Discrete Mathematics Department of CSE Graphs and Multigraphs A graph G consists of two things:

1.A set V of elements called nodes (or points or vertices)

2.A set E of edges such that each edge e in E is identified with a unique

(unordered) pair [u, v] of nodes in V, denoted by e = [u, v] Sometimes we indicate the parts of a graph by writing G = (V, E). Suppose e = [u, v]. Then the nodes u and v are called the endpoints of e, and u and v are said to be adjacent nodes or neighbors. The degree of a node u, written deg(u), is the number of edges containing u. If deg(u) = 0 — that is, if u does not belong to any edge—then u is called an isolated node.

Path and Cycle A path P of length n from a node u to a node v is defined as a sequence of n + 1 nodes. P = (v0, v1, v2, . . . , vn) such that u = v0; vi-1 is adjacent to vi for i = 1,2, . . ., n and vn = v. Types of Path

1. Simple Path 2. Cycle Path

(i) Simple Path Simple path is a path in which first and last vertex are different (V0 ≠ Vn)

(ii) Cycle Path Cycle path is a path in which first and last vertex are same (V0 = Vn).It is also called as Closed path. Connected Graph A graph G is said to be connected if there is a path between any two of its nodes.

Complete Graph A graph G is said to be complete if every node u in G is adjacent to every other node v in G.

Tree A connected graph T without any cycles is called a tree graph or free tree or, simply, a tree.

Labeled or Weighted Graph If the weight is assigned to each edge of the graph then it is called as Weighted or Labeled graph. St.peters Engineering college 24

Discrete Mathematics Department of CSE

The definition of a graph may be generalized by permitting the following:

1. Multiple edges: Distinct edges e and e' are called multiple edges if they connect the same endpoints, that is, if e = [u, v] and e' = [u, v]. 2. Loops: An edge e is called a loop if it has identical endpoints, that is, if e = [u, u]. 3. Finite Graph:A multigraph M is said to be finite if it has a finite number of nodes and a finite number of edges.

Directed Graphs A directed graph G, also called a digraph or graph is the same as a multigraph except that each edge e in G is assigned a direction, or in other words, each edge e is identified with an ordered pair (u, v) of nodes in G.

Outdegree and Indegree Indegree : The indegree of a vertex is the number of edges for which v is head Example

St.peters Engineering college 25

Discrete Mathematics Department of CSE

Indegree of 1 = 1 Indegree pf 2 = 2 Outdegree :The outdegree of a node or vertex is the number of edges for which v is tail. Example

Outdegree of 1 =1 Outdegree of 2 =2 Simple Directed Graph

A directed graph G is said to be simple if G has no parallel edges. A simple graph G may have loops, but it cannot have more than one loop at a given node.

Graph Traversal

The breadth first search (BFS) and the depth first search (DFS) are the two algorithms used for traversing and searching a node in a graph. They can also be used to find out whether a node is reachable from a given node or not.

Depth First Search (DFS)

The aim of DFS algorithm is to traverse the graph in such a way that it tries to go far from the root node. Stack is used in the implementation of the depth first search. Let’s see how depth first search works with respect to the following graph:

St.peters Engineering college 26

Discrete Mathematics Department of CSE

As stated before, in DFS, nodes are visited by going through the depth of the tree from the starting node. If we do the depth first traversal of the above graph and print the visited node, it will be “A B E F C D”. DFS visits the root node and then its children nodes until it reaches the end node, i.e. E and F nodes, then moves up to the parent nodes.

Algorithmic Steps

1. Step 1: Push the root node in the Stack. 2. Step 2: Loop until stack is empty. 3. Step 3: Peek the node of the stack. 4. Step 4: If the node has unvisited child nodes, get the unvisited child node, mark it as traversed and push it on stack. 5. Step 5: If the node does not have any unvisited child nodes, pop the node from the stack.

Based upon the above steps, the following Java code shows the implementation of the DFS algorithm:

public void dfs() { //DFS uses Stack data structure Stack s=new Stack(); s.push(this.rootNode); rootNode.visited=true; printNode(rootNode); while(!s.isEmpty()) { Node n=(Node)s.peek(); Node child=getUnvisitedChildNode(n); if(child!=null) { child.visited=true; printNode(child); s.push(child); } else

St.peters Engineering college 27

Discrete Mathematics Department of CSE { s.pop(); } } //Clear visited property of nodes clearNodes(); }

Breadth First Search (BFS)

This is a very different approach for traversing the graph nodes. The aim of BFS algorithm is to traverse the graph as close as possible to the root node. Queue is used in the implementation of the breadth first search. Let’s see how BFS traversal works with respect to the following graph:

If we do the breadth first traversal of the above graph and print the visited node as the output, it will print the following output. “A B C D E F”. The BFS visits the nodes level by level, so it will start with level 0 which is the root node, and then it moves to the next levels which are B, C and D, then the last levels which are E and F.

Algorithmic Steps

1. Step 1: Push the root node in the Queue. 2. Step 2: Loop until the queue is empty. 3. Step 3: Remove the node from the Queue. 4. Step 4: If the removed node has unvisited child nodes, mark them as visited and insert the unvisited children in the queue.

Based upon the above steps, the following Java code shows the implementation of the BFS algorithm:

public void bfs() { //BFS uses Queue data structure Queue q=new LinkedList(); q.add(this.rootNode); St.peters Engineering college 28

Discrete Mathematics Department of CSE printNode(this.rootNode); rootNode.visited=true; while(!q.isEmpty()) { Node n=(Node)q.remove(); Node child=null; while((child=getUnvisitedChildNode(n))!=null) { child.visited=true; printNode(child); q.add(child); } } //Clear visited property of nodes clearNodes(); } Spanning Trees: In the mathematical field of graph theory, a spanning tree T of a connected, undirected graph G is a tree composed of all the vertices and some (or perhaps all) of the edges of G. Informally, a spanning tree of G is a selection of edges of G that form a tree spanning every vertex. That is, every vertex lies in the tree, but no cycles (or loops) are formed. On the other hand, every bridge of G must belong to T. A spanning tree of a connected graph G can also be defined as a maximal set of edges of G that contains no cycle, or as a minimal set of edges that connect all vertices. Example:

A spanning tree (blue heavy edges) of a grid graph.

Spanning forests

A spanning forest is a type of subgraph that generalises the concept of a spanning tree. However, there are two definitions in common use. One is that a spanning forest is a subgraph that consists of a spanning tree in each connected component of a graph. (Equivalently, it is a maximal cycle-free subgraph.) This definition is common in computer science and optimisation. It is also the definition used when discussing minimum spanning forests, the generalization to disconnected graphs of minimum spanning trees. Another definition,

St.peters Engineering college 29

Discrete Mathematics Department of CSE common in graph theory, is that a spanning forest is any subgraph that is both a forest (contains no cycles) and spanning (includes every vertex).

Counting spanning trees

The number t(G) of spanning trees of a connected graph is an important invariant. In some cases, it is easy to calculate t(G) directly. It is also widely used in data structures in different computer languages. For example, if G is itself a tree, then t(G)=1, while if G is the cycle graph Cn with n vertices, then t(G)=n. For any graph G, the number t(G) can be calculated using Kirchhoff's matrix-tree theorem (follow the link for an explicit example using the theorem).

Cayley's formula is a formula for the number of spanning trees in the complete graph Kn with n vertices. n − 2 n − 2 The formula states that t(Kn) = n . Another way of stating Cayley's formula is that there are exactly n labelled trees with n vertices. Cayley's formula can be proved using Kirchhoff's matrix-tree theorem or via the Prüfer code.

q − 1 p − 1 If G is the complete bipartite graph Kp,q, then t(G) = p q , while if G is the n-dimensional hypercube

graph Qn, then . These formulae are also consequences of the matrix-tree theorem.

If G is a multigraph and e is an edge of G, then the number t(G) of spanning trees of G satisfies the deletion- contraction recurrence t(G)=t(G-e)+t(G/e), where G-e is the multigraph obtained by deleting e and G/e is the contraction of G by e, where multiple edges arising from this contraction are not deleted.

Uniform spanning trees

A spanning tree chosen randomly from among all the spanning trees with equal probability is called a uniform spanning tree (UST). This model has been extensively researched in probability and mathematical physics.

Algorithms

The classic spanning tree algorithm, depth-first search (DFS), is due to Robert Tarjan. Another important algorithm is based on breadth-first search (BFS).

Planar Graphs: In graph theory, a planar graph is a graph that can be embedded in the plane, i.e., it can be drawn on the plane in such a way that its edges intersect only at their endpoints. A planar graph already drawn in the plane without edge intersections is called a plane graph or planar embedding of the graph. A plane graph can be defined as a planar graph with a mapping from every node to a point in 2D space, and from every edge to a plane curve, such that the extreme points of each curve are the points mapped from its end nodes, and all curves are disjoint except on their extreme points. Plane graphs can be encoded by combinatorial maps.

St.peters Engineering college 30

Discrete Mathematics Department of CSE It is easily seen that a graph that can be drawn on the plane can be drawn on the sphere as well, and vice versa. The equivalence class of topologically equivalent drawings on the sphere is called a planar map. Although a plane graph has an external or unbounded face, none of the faces of a planar map have a particular status. Applications • Telecommunications – e.g. spanning trees • Vehicle routing – e.g. planning routes on roads without underpasses • VLSI – e.g. laying out circuits on computer chip. • The puzzle game Planarity requires the player to "untangle" a planar graph so that none of its edges intersect.

Example graphs

Planar Nonplanar

Butterfly graph

K5

K3,3 The complete graph

K4 is planar

St.peters Engineering college 31

Discrete Mathematics Department of CSE

Graph Theory and Applications

Graph Theory and Applications: Graphs are among the most ubiquitous models of both natural and human-made structures. They can be used to model many types of relations and process dynamics in physical, biological and social systems. Many problems of practical interest can be represented by graphs.

In computer science, graphs are used to represent networks of communication, data organization, computational devices, the flow of computation, etc. One practical example: The link structure of a website could be represented by a directed graph. The vertices are the web pages available at the website and a directed edge from page A to page B exists if and only if A contains a link to B. A similar approach can be taken to problems in travel, biology, computer chip design, and many other fields. The development of algorithms to handle graphs is therefore of major interest in computer science. There, the transformation of graphs is often formalized and represented by graph rewrite systems. They are either directly used or properties of the rewrite systems (e.g. confluence) are studied. Complementary to graph transformation systems focussing on rule-based in-memory manipulation of graphs are graph databases geared towards transaction-safe, persistent storing and querying of graph-structured data.

Graph-theoretic methods, in various forms, have proven particularly useful in linguistics, since natural language often lends itself well to discrete structure. Traditionally, syntax and compositional semantics follow tree-based structures, whose expressive power lies in the Principle of Compositionality, modeled in a hierarchical graph. Within lexical semantics, especially as applied to computers, modeling word meaning is easier when a given word is understood in terms of related words; semantic networks are therefore important in computational linguistics. Still other methods in phonology (e.g. Optimality Theory, which uses lattice graphs) and morphology (e.g. finite-state morphology, using finite-state transducers) are common in the analysis of language as a graph. Indeed, the usefulness of this area of mathematics to linguistics has borne organizations such as TextGraphs, as well as various 'Net' projects, such as WordNet, VerbNet, and others.

Graph theory is also used to study molecules in chemistry and physics. In condensed matter physics, the three dimensional structure of complicated simulated atomic structures can be studied quantitatively by gathering statistics on graph-theoretic properties related to the topology of the atoms. For example, Franzblau's shortest-path (SP) rings. In chemistry a graph makes a natural model for a molecule, where vertices represent atoms and edges bonds. This approach is especially used in computer processing of molecular structures, ranging from chemical editors to database searching. In statistical physics, graphs can represent local connections between interacting parts of a system, as well as the dynamics of a physical process on such systems.

Graph theory is also widely used in sociology as a way, for example, to measure actors' prestige or to explore diffusion mechanisms, notably through the use of social network analysis software.Likewise, graph theory is useful in biology and conservation efforts where a vertex can represent regions where certain species exist (or habitats) and the edges represent migration paths, or movement between the regions. This information is important when looking at breeding patterns or tracking the spread of disease, parasites or how changes to the movement can affect other species.

In mathematics, graphs are useful in geometry and certain parts of topology, e.g. Knot Theory. Algebraic graph theory has close links with group theory. St.peters Engineering college 32

Discrete Mathematics Department of CSE A graph structure can be extended by assigning a weight to each edge of the graph. Graphs with weights, or weighted graphs, are used to represent structures in which pairwise connections have some numerical values. For example if a graph represents a road network, the weights could represent the length of each road.

Basic Concepts Isomorphism: Let G1 and G1 be two graphs and let f be a function from the vertex set of G1 to the vertex set of G2. Suppose that f is one-to-one and onto & f(v) is adjacent to f(w) in G2 if and only if v is adjacent to w in G1.

Then we say that the function f is an isomorphism and that the two graphs G1 and G2 are isomorphic. So two graphs G1 and G2 are isomorphic if there is a one-to-one correspondence between vertices of G1 and those of G2 with the property that if two vertices of G1 are adjacent then so are their images in G2. If two graphs are isomorphic then as far as we are concerned they are the same graph though the location of the vertices may be different. To show you how the program can be used to explore isomorphism draw the graph in figure 4 with the program (first get the null graph on four vertices and then use the right mouse to add edges).

Save this graph as Graph 1 (you need to click Graph then Save). Now get the circuit graph with 4 vertices. It looks like figure 5, and we shall call it C(4).

Example:

The two graphs shown below are isomorphic, despite their different looking drawings.

St.peters Engineering college 33

Discrete Mathematics Department of CSE

Graph G Graph H An isomorphism between G and H

ƒ(a) = 1

ƒ(b) = 6

ƒ(c) = 8

ƒ(d) = 3

ƒ(g) = 5

ƒ(h) = 2

ƒ(i) = 4

ƒ(j) = 7

Subgraphs:

A subgraph of a graph G is a graph whose vertex set is a subset of that of G, and whose adjacency relation is a subset of that of G restricted to this subset. In the other direction, a supergraph of a graph G is a graph of which G is a subgraph. We say a graph G contains another graph H if some subgraph of G is H or is isomorphic to H.

A subgraph H is a spanning subgraph, or factor, of a graph G if it has the same vertex set as G. We say H spans G.

A subgraph H of a graph G is said to be induced if, for any pair of vertices x and y of H, xy is an edge of H if and only if xy is an edge of G. In other words, H is an induced subgraph of G if it has all the edges that appear in G over the same vertex set. If the vertex set of H is the subset S of V(G), then H can be written as G[S] and is said to be induced by S.

A graph that does not contain H as an induced subgraph is said to be H-free.

A universal graph in a class K of graphs is a simple graph in which every element in K can be embedded as a subgraph.

St.peters Engineering college 34

Discrete Mathematics Department of CSE

K5, a complete graph. If a subgraph looks like this, the vertices in that subgraph form a clique of size 5.

Multi graphs:

In mathematics, a multigraph or pseudograph is a graph which is permitted to have multiple edges, (also called "parallel edges"), that is, edges that have the same end nodes. Thus two vertices may be connected by more than one edge. Formally, a multigraph G is an ordered pair G:=(V, E) with

• V a set of vertices or nodes, • E a multiset of unordered pairs of vertices, called edges or lines.

Multigraphs might be used to model the possible flight connections offered by an airline. In this case the multigraph would be a directed graph with pairs of directed parallel edges connecting cities to show that it is possible to fly both to and from these locations.

A multigraph with multiple edges (red) and a loop (blue). Not all authors allow multigraphs to have loops.

St.peters Engineering college 35

Discrete Mathematics Department of CSE

Euler circuits:

In graph theory, an Eulerian trail is a trail in a graph which visits every edge exactly once. Similarly, an Eulerian circuit is an Eulerian trail which starts and ends on the same vertex. They were first discussed by Leonhard Euler while solving the famous Seven Bridges of Königsberg problem in 1736. Mathematically the problem can be stated like this:

Given the graph on the right, is it possible to construct a path (or a cycle, i.e. a path starting and ending on the same vertex) which visits each edge exactly once?

Euler proved that a necessary condition for the existence of Eulerian circuits is that all vertices in the graph have an even degree, and stated without proof that connected graphs with all vertices of even degree have an Eulerian circuit. The first complete proof of this latter claim was published in 1873 by Carl Hierholzer.

The term Eulerian graph has two common meanings in graph theory. One meaning is a graph with an Eulerian circuit, and the other is a graph with every vertex of even degree. These definitions coincide for connected graphs.

For the existence of Eulerian trails it is necessary that no more than two vertices have an odd degree; this means the Königsberg graph is not Eulerian. If there are no vertices of odd degree, all Eulerian trails are circuits. If there are exactly two vertices of odd degree, all Eulerian trails start at one of them and end at the other. Sometimes a graph that has an Eulerian trail but not an Eulerian circuit is called semi-Eulerian.

An Eulerian trail, Eulerian trail or Euler walk in an undirected graph is a path that uses each edge exactly once. If such a path exists, the graph is called traversable or semi-eulerian.

An Eulerian cycle, Eulerian circuit or Euler tour in an undirected graph is a cycle that uses each edge exactly once. If such a cycle exists, the graph is called unicursal. While such graphs are Eulerian graphs, not every Eulerian graph possesses an Eulerian cycle.

For directed graphs path has to be replaced with directed path and cycle with directed cycle.

The definition and properties of Eulerian trails, cycles and graphs are valid for multigraphs as well.

This graph is not Eulerian, therefore, a solution does not exist.

St.peters Engineering college 36

Discrete Mathematics Department of CSE

Every vertex of this graph has an even degree, therefore this is an Eulerian graph. Following the edges in alphabetical order gives an Eulerian circuit/cycle. Hamiltonian graphs: In the mathematical field of graph theory, a Hamiltonian path (or traceable path) is a path in an undirected graph which visits each vertex exactly once. A Hamiltonian cycle (or Hamiltonian circuit) is a cycle in an undirected graph which visits each vertex exactly once and also returns to the starting vertex. Determining whether such paths and cycles exist in graphs is the Hamiltonian path problem which is NP-complete.

Hamiltonian paths and cycles are named after William Rowan Hamilton who invented the Icosian game, now also known as Hamilton's puzzle, which involves finding a Hamiltonian cycle in the edge graph of the dodecahedron. Hamilton solved this problem using the Icosian Calculus, an algebraic structure based on roots of unity with many similarities to the quaternions (also invented by Hamilton). This solution does not generalize to arbitrary graphs.

A Hamiltonian path or traceable path is a path that visits each vertex exactly once. A graph that contains a Hamiltonian path is called a traceable graph. A graph is Hamilton-connected if for every pair of vertices there is a Hamiltonian path between the two vertices.

A Hamiltonian cycle, Hamiltonian circuit, vertex tour or graph cycle is a cycle that visits each vertex exactly once (except the vertex which is both the start and end, and so is visited twice). A graph that contains a Hamiltonian cycle is called a Hamiltonian graph.

Similar notions may be defined for directed graphs, where each edge (arc) of a path or cycle can only be traced in a single direction (i.e., the vertices are connected with arrows and the edges traced "tail-to-head").

A Hamiltonian decomposition is an edge decomposition of a graph into Hamiltonian circuits.

Examples • a complete graph with more than two vertices is Hamiltonian • every cycle graph is Hamiltonian • every tournament has an odd number of Hamiltonian paths • every platonic solid, considered as a graph, is Hamiltonian

St.peters Engineering college 37

Discrete Mathematics Department of CSE

Chromatic Numbers:

In graph theory, graph coloring is a special case of graph labeling; it is an assignment of labels traditionally called "colors" to elements of a graph subject to certain constraints. In its simplest form, it is a way of coloring the vertices of a graph such that no two adjacent vertices share the same color; this is called a vertex coloring. Similarly, an edge coloring assigns a color to each edge so that no two adjacent edges share the same color, and a face coloring of a planar graph assigns a color to each face or region so that no two faces that share a boundary have the same color.

Vertex coloring is the starting point of the subject, and other coloring problems can be transformed into a vertex version. For example, an edge coloring of a graph is just a vertex coloring of its line graph, and a face coloring of a planar graph is just a vertex coloring of its planar dual. However, non-vertex coloring problems are often stated and studied as is. That is partly for perspective, and partly because some problems are best studied in non-vertex form, as for instance is edge coloring.

The convention of using colors originates from coloring the countries of a map, where each face is literally colored. This was generalized to coloring the faces of a graph embedded in the plane. By planar duality it became coloring the vertices, and in this form it generalizes to all graphs. In mathematical and computer representations it is typical to use the first few positive or nonnegative integers as the "colors". In general one can use any finite set as the "color set". The nature of the coloring problem depends on the number of colors but not on what they are.

Graph coloring enjoys many practical applications as well as theoretical challenges. Beside the classical types of problems, different limitations can also be set on the graph, or on the way a color is assigned, or even on the color itself. It has even reached popularity with the general public in the form of the popular number puzzle Sudoku. Graph coloring is still a very active field of research.

A proper vertex coloring of the Petersen graph with 3 colors, the minimum number possible.

Vertex coloring

When used without any qualification, a coloring of a graph is almost always a proper vertex coloring, namely a labelling of the graph’s vertices with colors such that no two vertices sharing the same edge have the same color. Since a vertex with a loop could never be properly colored, it is understood that graphs in this context are loopless.

St.peters Engineering college 38

Discrete Mathematics Department of CSE The terminology of using colors for vertex labels goes back to map coloring. Labels like red and blue are only used when the number of colors is small, and normally it is understood that the labels are drawn from the integers {1,2,3,...}. A coloring using at most k colors is called a (proper) k-coloring. The smallest number of colors needed to color a graph G is called its chromatic number, χ(G). A graph that can be assigned a (proper) k-coloring is k-colorable, and it is k-chromatic if its chromatic number is exactly k. A subset of vertices assigned to the same color is called a color class, every such class forms an independent set. Thus, a k-coloring is the same as a partition of the vertex set into k independent sets, and the terms k-partite and k-colorable have the same meaning.

This graph can be 3-colored in 12 different ways. The following table gives the chromatic number for familiar classes of graphs.

graph

complete graph

cycle graph ,

star graph , 2

wheel graph ,

St.peters Engineering college 39

Discrete Mathematics Department of CSE

2. Support materials

In order for students to complete their assignments, fare well in exams and fulfill the learning objectives, BVRIT supplies them with additional material.

2.1 Computer files, programs, and documents

This section includes on-line documentation, and the electronic version of the handouts, teaching material and Course Activities support documents.

St.peters Engineering college 40

Discrete Mathematics Department of CSE

2.2 Unit Wise Question Bank

2.2.1 Subjective Question Bank

UNIT-I

1) Write the following statements in symbolic form a) Mark is poor but happy b) Mark is rich or unhappy c) Mark is neither rich nor happy d) Mark is poor or he is both rich and unhappy 2) Construct the truth tables for the following formulas a) (Q (P b) (P Q) R) 3) Determine which of the formulas are tautologies or or contingencies a) ((P Q) P ) b) ((P Q) (Q P)) c) ((Q P) Q) 4) Show the following implications a) (P Q)  (P Q) b) (P (Q R))  (P Q) (P R) with out constructing truth table 5) Show the following equivalence a)  (P Q)  (P Q)  (P Q) with out constructing truth table b) (P Q) (R Q)  (P R) Q 6) Obtain the principal of P (P (Q (Q R))) 7) Obtain the principal of P (P (Q P) ) 8) Express P→(P→Q) in terms of only. Express the same formula in terms of  only. 9) Obtain product- of- sums canonical form for (P Q) (P Q). 10) Obtain sum-of-products canonical form for (P Q) (P Q)

St.peters Engineering college 41

Discrete Mathematics Department of CSE

1) Indicate the variables that are free and bound. Also show the scope of the quantifier. a) (P(x) R(x)) P(x) Q(x) b) (P(x) Q(x)) ( P(x) Q(x) ) 2) Convert the following predicates into symbolic form a) x is the father of the mother of y b) All the world loves a lover c) All men are giants d) Some cats are black 3) Show that P(x) Q(x)  (P(x) Q(x)). 4) Show that  Q(x) is a valid conclusion from the premises  (P(x) Q(x)) , P(x). 5) Using Rule CP show the following implication

(P(x) Q(x) ), ( R(x) Q(x) ) ( R(x) P(x))

6) Show the following implication using indirect method of proof

(P(x) Q(x))  P(x) Q(x)

7) Test the validity of the following arguments

No real numbers has Negative Square

All real numbers are complex numbers

Some complex numbers have negative squares

Z is a number whose square is not negative

There fore Z is a real number

8) Show that S R is tautologically implied by (P Q) (P R) (Q S) using automatic thermo proving.

St.peters Engineering college 42

Discrete Mathematics Department of CSE Unit-II

1) Given S={1,2,3,4,5,6,7,8,9,10} and relation R on S where R={(x, y) / x+ y =10} What are the properties of the relation R? 2) Let X={1,2,3,4} and R={(x, y)/x>y}.Draw the graph of R and also give its matrix. 3) Let R denote a relation on the set of ordered pairs of positive integers such that (x, y) R (u, v) iff xv=yu. Show that R is an equivalence relation. 4) Given a set s={1,2,3,4,5}, find the equivalence relation on s which generates the partition {{1,2},{3},{4,5}}.Draw the graph of the relation. 5) Let the compatibility relation on a set {x1,x2,x3,……,.x6} be given by the matrix X2 1 X3 1 1 X4 0 0 1 X5 0 0 1 1 X6 1 0 1 0 1

Draw the graph and find the maximal compatibility blocks of the relation

6) Given the relation matrix MR of a relation R on the set {a, b, c} , find the relation matrices of R’ , 2 3 R ,R and RoR` 1 0 1

MR = 1 1 0 1 1 1 7) Draw the Hasse diagrams of the following sets under the partial ordering relation “divides”. a) {2,6,24} b) {1,2,3,6,12} c) {3,9,27,54} 8) List all possible functions from X= {a, b, c} to Y= {0,1} and indicate in each case whether the function is one-to-one ,is onto, and is one-to-one onto. 9) Let f: R R and g: R R, where R is the set of real numbers. Find fog and gof, where f(x) = x2-2 and g(x) = x+4. State whether these functions are injective, surjective, and bijective. 10) Let f: R R be given by f(x) = x2-2. Find f-1

St.peters Engineering college 43

Discrete Mathematics Department of CSE

Unit IV

1) In how many ways can we draw a heart or a spade from an ordinary deck of playing cards? A heart or an ace? An ace or a king? A card numbered 2 through 10? A numbered card or a king? 2) How many 3-letter words can be formed using the letters a, b, c, d, e, f and using a letter only once if: a) The letter a is to be used? b) Either a or b or both a and b are used? c) The letter a is not used? 3) How many ways are there to roll two distinguishable dice to yield a sum that is divisible by 3? 4) In how many ways can 10 people arrange themselves a) In a row of 10 chairs? b) In a circle of 10 chairs 5) a) How many binary sequences are there of length 15? b) How many binary sequences are there of length 15 with exactly six 1’s? 6) A multiple choice test has 15 questions and 4 choices for each answer how many ways can the 15 questions be answered so that a) Exactly 3 answers are correct? b) Atleast 3 answers are correct? 7) Find the number of ways in which 5 different english books,6 french books,3 german books and 7 russian books can be arranged in a shelf so that all books of the same language are together? 8) How many solutions are there to the equation x1+x2+x3+x4+x5=50 in non negative integers? 9) How many ways can we distribute 12 white balls and 2 black balls? a) Into 9 numbered boxes? b) Into 9 numbered boxes where each box contains atleast one white ball? 10) Consider the word TRIANNUAL a) How many arrangements are there to these 9 letters? b) How many 9 letters words are there with the letters T,I,and U separated by exactly 2 of the other letters? c) How many 6 letter words can be formed from the letters of TRIANNUAL with no N’S?

St.peters Engineering college 44

Discrete Mathematics Department of CSE

1) Find the coefficient of X16 in (1+X4+X8)10?

2) In (1+X4+X9)10 find the coefficient of x23 and x32?

3) Write the following formal power series expression for 1/(1-5X)3?

4) FInd the coefficient of X12 in 1-X4-X7+X11/(1-X)5?

5) Solve the recurrence relation an-7a(n-1)+10a(n-2)=0 for n>=2 using generating functions method?

6) Solve the following recurrence relation by substitution an=a(n-1)+n(n-1) where a0=1?

7) Solve the following recurrence relation using generating functions an-2an-3+a(n-6)=0 for n>=6 and a0=1,a1=0,a2=a3=a4=a5=0?

8) Solve the following recurrence relation using the characteristic roots An-7a(n-1)+8a(n-2)=0 and a0=2,a1=-7? 9) Solve the following recurrence relation using generating functions An-5a(n-1)+6a(n-2)=4n-2 for n>=2 and a0=1,a1=5? 10) Find the complete solution to an+2a(n-1)=n+3 for n>=1 and with a0=3?

St.peters Engineering college 45

Discrete Mathematics Department of CSE

UNIT-V

1) Suppose that G is a non directed graph with 12 edges .suppose the G has 6 vertices of degree 3 and the rest have degrees less than 3.Determine the minimum number of vertices G can have ? 2) Is there a graph with the degree sequence (1,1,3,3,3,4,6,7)?If yes find whether the graph is simple or multi graph? 3) Prove if V={ v1,v2,……vn } is the vertex set of a nondirected graph then

 deg (vi)=2E, i =1 to n + - If g is a directed graph, then  deg (vi)=  deg (vi)= E. 4) Verify whether the following two graphs are isomorphic or not.

5) Determine the number of edges in the following graphs a) Kn b) Km, n c) Cn d) Pn Also determine the number of vertices in Km, n 6) Define spanning tree and circuit rank? Draw all possible spanning trees for the following graph.

7) Write algorithm for Breadth first search. construct spanning for the following figure using BFS.

St.peters Engineering college 46

Discrete Mathematics Department of CSE 8) Construct spanning tree for the following graph using DFS algorithm.

9) Find a minimal spanning tree for the following graph using krushkal’s algorithm?

10) Find a minimal spanning tree for the following graph using prim’s algorithm?

1) Show that K5 is non planar

2) Draw planar graphs for n= 2, 3, 4,5

St.peters Engineering college 47

Discrete Mathematics Department of CSE 3) Draw the dual graph for each of the following graph

4) a) A complete graph kn is planar iff n<=4

b) A complete bipartite graph Km,n is planar iff m<=2 or n<=2

5) Which of the multigraphs have Euler paths, circuits, or neither?

6) Find a Hamiltonian cycle in each of the following graphs?

7) how many different Hamiltonian cycles are there in Kn, a complete graph on n vertices?

8) Define chromatic number? Find the chromatic number of a)cycle b)tree c)bipartite graph

9) Determine the chromatic number for the following graphs?

St.peters Engineering college 48

Discrete Mathematics Department of CSE

St.peters Engineering college 49