<<

Appendix 445

12.4 review

This section is a very brief summary of Calculus skills required for reading this book.

12.4.1 Inverse

Function g is the inverse function of function f if

g(f(x)) = x and f(g(y)) = y

for all x and y where f(x) and g(y) exist.

1 Notation Inverse function g = f −

1 (Don’t confuse the inverse f − (x) with 1/f(x). These are different functions!) To find the inverse function, solve the

f(x) = y.

The solution g(y) is the inverse of f. For example, to find the inverse of f(x)=3+1/x, we solve the equation 1 3 + 1/x = y 1/x = y 3 x = . ⇒ − ⇒ y 3 − The inverse function of f is g(y) = 1/(y 3). −

12.4.2 Limits and continuity

A function f(x) has a L at a point x0 if f(x) approaches L when x approaches x0. To say it more rigorously, for any ε there exists such δ that f(x) is ε-close to L when x is δ-close to x0. That is, if x x < δ then f(x) L < ε. | − 0| | − | A function f(x) has a limit L at + if f(x) approaches L when x goes to + . Rigorously, for any ε there exists such N that f∞(x) is ε-close to L when x gets beyond N∞, i.e.,

if x > N then f(x) L < ε. | − | Similarly, f(x) has a limit L at if for any ε there exists such N that f(x) is ε-close to L when x gets below ( N), i.e.,−∞ − if x < N then f(x) L < ε. − | − |

Notation lim f(x) = L, or f(x) L as x x0 x x0 → → lim→ f(x) = L, or f(x) L as x x → → ∞ lim→∞ f(x) = L, or f(x) L as x x →−∞ → → −∞ 446 Probability and Statistics for Computer Scientists

Function f is continuous at a point x0 if

lim f(x) = f(x0). x x0 → Function f is continuous if it is continuous at every point.

12.4.3 Sequences and

A sequence is a function of a positive integer argument, f(n) where n = 1, 2, 3,.... Sequence f(n) converges to L if lim f(n) = L n →∞ and diverges to infinity if for any M there exists N such that f(n) >M when n>N.

A series is a sequence of partial sums, n

f(n) = ak = a1 + a2 + . . . + an. kX=1 is defined by n an = Cr , where r is called the ratio of the series. In general,

n rm rn+1 Crn = f(n) f(m 1) = C − . − − 1 r kX=m − For m = 0, we get n 1 rn+1 Crn = C − . 1 r kX=0 − A geometric series converges if and only if r < 1. In this case, | | ∞ Crm ∞ C lim Crn = and Crn = . n →∞ 1 r 1 r kX=m − kX=0 − A geometric series diverges to if r 1. ∞ ≥

12.4.4 , minimum, and maximum

Derivative of a function f at a point x is the limit f(y) f(x) f ′(x) = lim − y x y x → − provided that this limit exists. Taking is called differentiation. A function that has derivatives is called differentiable. Appendix 447

d Notation f (x) or f(x) ′ dx

Differentiating a function of several variables, we take partial derivatives denoted as ∂ ∂ f(x1,x2,...), f(x1,x2,...), etc.. ∂x1 ∂x2

The most important derivatives are:

m m 1 (x )′ = mx − x x (e )′ = e

(ln x)′ = 1/x

C′ = 0

(f + g)′(x) = f ′(x) + g′(x) Derivatives (Cf)′(x) = Cf ′(x)

(f(x)g(x))′ = f ′(x)g(x) + f(x)g′(x)

f(x) ′ f ′(x)g(x) f(x)g′(x) = − g(x) g2(x)   for any functions f and g and any number C

To differentiate a composite function

f(x) = g(h(x)), we use a ,

d Chain rule g(h(x)) = g′(h(x))h′(x) dx

For example, d 1 ln3(x) = 3 ln2(x) . dx x

Geometrically, derivative f ′(x) equals the of a line at point x; see Figure 12.1.

Computing and minima

At the points where a differentiable function reaches its minimum or maximum, the tangent line is always flat; see points x2 and x3 on Figure 12.1. The slope of a horizontal line is 0, 448 Probability and Statistics for Computer Scientists

f(x) 6 slope = 0

slope =

f (′x 4 ) ) 1 (x f′ slope = 0 slope = - x1 x2 x3 x4 x

FIGURE 12.1: Derivative is the slope of a tangent line.

and thus, f ′(x) = 0 at these points. To find out where a function is maximized or minimized, we consider

– solutions of the equation f ′(x) = 0,

– points x where f ′(x) fails to exist, – endpoints. The highest and the lowest values of the function can only be attained at these points.

12.4.5

Integration is an action opposite to differentiation. A function F (x) is an antiderivative (indefinite ) of a function f(x) if

F ′(x) = f(x).

Indefinite integrals are defined up to a C because when we take derivatives, C′ = 0.

Notation F (x) = f(x) dx Z An integral (definite integral) of a function f(x) from point a to point b is the difference of antiderivatives, b f(x) dx = F (b) F (a). − Za Improper integrals

b ∞ ∞ f(x) dx, f(x) dx, f(x) dx a Z Z−∞ Z−∞ Appendix 449

are defined as limits. For example,

b ∞ f(x) dx = lim f(x) dx. b Za →∞ Za The most important integrals are:

xm+1 xm dx = for m = 1 m + 1 6 − R 1 x− dx = ln(x) Indefinite R ex dx = ex integrals R (f(x) + g(x))dx = f(x)dx + g(x)dx

R Cf(x)dx = CR f(x)dx R Rfor any functions f and g Rand any number C

2 3 4 For example, to evaluate a definite integral 0 x dx, we find an antiderivative F (x) = x /4 and compute F (2) F (0) = 4 0 = 4. A standard way to write this solution is − − R 2 x4 x=2 24 04 x3dx = = = 4. 4 4 − 4 Z0 x=0

Two important integration skills are integration by substitution and .

Integration by substitution

An integral often simplifies when we can denote a part of the function as a new (y). The limits of integration a and b are then recomputed in terms of y, and dx is replaced by dy dx dx = or dx = dy, dy/dx dy whichever is easier to find. Notice that dx/dy is the derivative of the inverse function x(y).

Integration dx f(x) dx = f(x(y)) dy by substitution dy Z Z

For example,

2 6 y=6 6 3 1 1 e e− e3xdx = ey dy = ey = − = 134.5. 1 3 3 3 y= 3 3 Z− Z−   −

450 Probability and Statistics for Computer Scientists

Here we substituted y = 3x, recomputed the limits of integration, found the inverse function x = y/3 and its derivative dx/dy = 1/3. In the next example, we substitute y = x2. Derivative of this substitution is dy/dx = 2x:

2 4 4 y=4 4 2 dy 1 1 e 1 x ex dx = x ey = eydy = ey = − = 26.8. 2x 2 2 2 Z0 Z0 Z0 y=0

Integration by parts

This technique often helps to integrate a product of two functions. One of the parts is integrated, the other is differentiated.

Integration f ′(x)g(x)dx = f(x)g(x) f(x)g′(x)dx by parts − Z Z

Applying this method is reasonable only when function (fg′) is simpler for integration than the initial function (f ′g). x In the following example, we let f ′(x) = e be one part and g(x) = x be the other. Then x f ′(x) is integrated, and its antiderivative is f(x) = e . The other part g(x) is differentiated, and g′(x) = x′ = 1. The integral simplifies, and we can evaluate it,

x exdx = x ex (1)(ex)dx = x ex ex. − − Z Z

Computing

Area under the graph of a positive function f(x) and above the [a, b] equals the integral, b ( from a to b) = f(x)dx. Za Here a and b may be finite or infinite; see Figure 12.2.

Gamma function and

Gamma function is defined as

∞ t 1 x Γ(t) = x − e− dx for t > 0. Z0 Taking this integral by parts, we obtain two important properties of a Gamma function, Γ(t + 1) = tΓ(t) for any t > 0, Γ(t + 1) = t! = 1 2 . . . t for integer t. · · · Appendix 451

6f(x)

This This area area equals equals

b ∞ f(x)dx f(x)dx c Za Z - a b c x

FIGURE 12.2: Integrals are areas under the graph of f(x).

12.5 Matrices and linear systems

A matrix is a rectangular chart with numbers written in rows and columns,

A11 A12 A1c A A · · · A A =  21 22 · · · 2c  ··· ··· ··· ···  A A A   r1 r2 · · · rc    where r is the number of rows and c is the number of columns. Every element of matrix A is denoted by Aij , where i [1, r] is the row number and j [1, c] is the column number. It is referred to as an “r c∈matrix.” ∈ ×

Multiplying a row by a column

A row can only be multiplied by a column of the same length. The product of a row A and a column B is a number computed as

B1 n . (A1,...,An)  .  = AiBi. B i=1  n  X  

Example 12.2 (Measurement conversion). To convert, say, 3 hours 25 minutes 45 seconds into seconds, one may use a formula

3600 (3 25 45) 60 = 12345 (sec).  1    ♦ 452 Probability and Statistics for Computer Scientists

Multiplying matrices

Matrix A may be multiplied by matrix B only if the number of columns in A equals the number of rows in B. If A is a k m matrix and B is an m n matrix, then their product AB = C is a k n matrix. Each× element of C is computed× as ×

m ith row jth column C = A B = . ij is sj of A of B s=1 X    Each element of AB is obtained as a product of the corresponding row of A and column of B.

Example 12.3. The following product of two matrices is computed as

2 6 9 3 (2)(9) + (6)( 3), (2)( 3) + (6)(1) 0 0 = = . 1 3 3− 1 (1)(9) + (3)(−3), (1)(−3) + (3)(1) 0 0   −   − −    ♦

In the last example, the result was a zero matrix “accidentally.” This is not always the case. However, we can notice that matrices do not always obey the usual rules of arithmetics. In particular, a product of two non-zero matrices may equal a 0 matrix. Also, in this regard, matrices do not commute, that is, AB = BA, in general. 6

Transposition

Transposition is reflecting the entire matrix about its main diagonal.

Notation AT = transposed matrix A

Rows become columns, and columns become rows. That is,

T Aij = Aji.

For example, T 1 7 1 2 3 = 2 8 . 7 8 9     3 9 The transposed product of matrices is  

(AB)T = BT AT Appendix 453

Solving systems of

In Chapters 6 and 7, we often solve systems of n linear equations with n unknowns and find a steady-state distribution. There are several ways of doing so. One method to solve such a system is by variable elimination. Express one variable in terms of the others from one equation, then substitute it into the unused equations. You will get a system of (n 1) equations with (n 1) unknowns. Proceeding in the same way, we reduce the number− of unknowns until we end− up with 1 equation and 1 unknown. We find this unknown, then go back and find all the other unknowns.

Example 12.4 (Linear system). Solve the system

2x + 2y + 5z = 12 3y z = 0  4x 7y − z = 2  − − We don’t have to start solving from the first equation. Start with the one that seems simple. From the second equation, we see that

z = 3y.

Substituting (3y) for z in the other equations, we obtain

2x + 17y = 12 4x 10y = 2  − We are down by one equation and one unknown. Next, express x from the first equation, 12 17y x = − = 6 8.5y 2 − and substitute into the last equation,

4(6 8.5y) 10y = 2. − − Simplifying, we get 44y = 22, hence y = 0.5. Now, go back and recover the other variables,

x = 6 8.5y = 6 (8.5)(0.5) = 1.75; z = 3y = 1.5. − − The answer is x = 1.75, y = 0.5, z = 1.5. We can check the answer by substituting the result into the initial system,

2(1.75) + 2(0.5) + 5(1.5) = 12 3(0.5) 1.5 = 0  4(1.75) 7(0.5) − 1.5 = 2  − −  ♦

We can also eliminate variables by multiplying entire equations by suitable coefficients, adding and subtracting them. Here is an illustration of that. 454 Probability and Statistics for Computer Scientists

Example 12.5 (Another method). Here is a shorter solution of Example 12.4. Double the first equation, 4x + 4y + 10z = 24, and subtract the third equation from it, 11y + 11z = 22, or y + z = 2. This way, we eliminated x. Then, adding (y + z = 2) and (3y z = 0), we get 4y = 2, and again, y = 0.5. Other variables, x and z, can now be obtained− from y, as in Example 12.4. ♦

The system of equations in this example can be written in a matrix form as 2 0 4 x y z 2 3 7 = 12 0 2 ,  −  5 1 1   − −     or, equivalently, 2 2 5 x 0 3 1 y = 12 0 2 .  −    4 7 1 z  − −      

Inverse matrix

Matrix B is the inverse matrix of A if 1 0 0 0 0 1 0 · · · 0 AB = BA = I =  0 0 1 · · · 0  , · · ·    ··· ··· ··· ··· ···   0 0 0 1   · · ·    where I is the identity matrix. It has 1s on the diagonal and 0s elsewhere. Matrices A and B have to have the same number of rows and columns.

1 Notation A− = inverse of matrix A

Inverse of a product can be computed as

1 1 1 (AB)− = B− A−

1 To find the inverse matrix A− by hand, write matrices A and I next to each other. Mul- tiplying rows of A by constant coefficients, adding and interchanging them, convert matrix 1 A to the identity matrix I. The same operations convert matrix I to A− ,

1 A I I A− . −→   Appendix 455

Example 12.6. Linear system in Example 12.4 is given by matrix 2 2 5 A = 0 3 1 .  4 7 −1  − −   1 Repeating the row operations from this example, we can find the inverse matrix A− , 2 2 5 1 0 0 4 4 10 2 0 0 0 3 1 0 1 0 0 3 1 0 1 0  −  −→  −  4 7 1 0 0 1 4 7 1 0 0 1 − − − −     0 11 11 2 0 1 0 1 1 2/11 0 1/11 − − 0 3 1 0 1 0 0 3 1 0 1 0 −→  −  −→  −  4 7 1 0 0 1 4 7 1 0 0 1 − − − − 0 4 0 2/11 1 1/11  0 1 0 1/22 1/4 1/44 − − 0 3 1 0 1 0 0 3 1 0 1 0 −→  −  −→  −  4 7 1 0 0 1 4 10 0 0 1 1 − − − −  0 1 0 1/22 1/4 1/44  1 0 0 5/44 3/8 17/88  − 0 0 1 3/22 1/4 3/44 0 1 0 1/22 1/4 1/44 −→  − −  −→  −  4 0 0 10/22 3/2 34/44 0 0 1 3/22 1/4 3/44 − −     The inverse matrix is found,

5/44 3/8 17/88 1 A− = 1/22 1/4 1/44 .  3/22 1/4 −3/44  − −  1 1  You can verify the result by multiplying A− A or AA− . ♦

For a 2 2 matrix, the formula for the inverse is × 1 a b − 1 d b = − . c d ad bc c a   −  − 

Matrix operations in R

x <- c(1,8,0,3,3,-3,5,0,-1) # Define a 1 9 vector and converting it... A <- matrix(x,3,3) # ... into a ×3 3 matrix column by column × t(A) # Transposed matrix B <- solve(A) # Inverse matrix A + B # Addition A %*% B # Matrix multiplication C <- A * B # Multiplying element by element, Cij = AijBij diag(n) # n n identity matrix matrix(rep(0,m*n),m,n) # m× n matrix of 0s cbind(A,B) # Joining× matrices side by side (as columns) rbind(A,B) # Joining matrices below each other (as rows) A[2:3,] # Sub-matrix: rows 2-3 and all columns of A # Calculation of a power of a matrix is a part of R package ‘expm’ install.packages("expm") library(expm) A % % 3 # This calculates A3 = A A A 3 3· 1· solve(A % % 3) # The result is A− = (A )− b b 456 Probability and Statistics for Computer Scientists

Matrix operations in MATLAB

A = [1 3 5; 8 3 0; 0 -3 -1]; % Entering a matrix B = [ 398 0 0 2 % Another way to define a matrix 9 2 1 ]; A+B % Addition A*B % Matrix multiplication C=A.*B % Multiplying element by element, Cij = Aij Bij A n % Power of a matrix, An = A . . . A · · n times Ab′ % transposed matrix A(2:3,:) # Sub-matrix: rows 2-3 and| all{z columns} of A inv(A) % inverse matrix A (-1) eye(n) % n n identity matrix × zeros(m,n)b % m n matrix of 0s [AB] % Joining× matrices side by side (as columns) [ A % Joining matrices below each other (as rows) B ] rand(m,n) % matrix of Uniform(0,1) random numbers randn(m,n) % matrix of Normal(0,1) random numbers