<<

Appendix A -Vector Representation for Signal Transformation

A set of numbers can be used to represent discrete signals. These numbers carry a certain amount of information and are subject to change by various kinds of trans- formations, called systems. For example, a one-dimensional linear time-invariant system can be expressed by its corresponding impulse response. The output of the system is then determined by the of the impulse response and the input signal. Convolution equations, in general, are too complicated to efficiently express related theories and algorithms. Analysis and representation of signal transformations can be substantially sim- plified by using matrix-vector representation, where a vector and a matrix, respec- tively, represent the corresponding signal and transformation.

A.1 One-Dimensional Signals and Systems

Suppose a one-dimensional system has input signal x(n), n ¼ 0, 1, ..., N À 1 and impulse response h(n). The output of the system can be expressed as the one-dimensional convolution:

XNÀ1 ynðÞ¼ hnðÞÀ q xqðÞ,forn ¼ 0, 1, ..., N À 1: ðA:1Þ q¼0 By simply rewriting Eq. (A.1), we have yðÞ¼0 hðÞ0 xðÞþ0 hðÞÀ1 xðÞþ1 hðÞÀ2 xð2 ÞþÁÁÁ yðÞ¼1 hðÞ1 xðÞþ0 hðÞ0 xðÞþ1 hðÞÀ1 xð2 ÞþÁÁÁ ðA:2Þ ⋮ yNðÞ¼À 1 hNðÞÀ 1 xðÞþ0 hNðÞÀ 2 xðÞþ1 hNðÞÀ 3 xð2 ÞþÁÁÁ:

© Springer International Publishing Switzerland 2016 249 M.A. Abidi et al., Optimization Techniques in Computer Vision, Advances in Computer Vision and Pattern Recognition, DOI 10.1007/978-3-319-46364-3 250 A Matrix-Vector Representation for Signal Transformation

If we express both input and output signals as N Â 1 vectors, such as

x ¼ ½ŠxðÞ0 xð1 Þ ÁÁÁ xNðÞÀ 1 T and y ¼ ½ŠyðÞ0 yð1 Þ ÁÁÁ yNðÞÀ 1 T; ðA:3Þ then the output vector is obtained by the following matrix-vector multiplication

y ¼ Hx; ðA:4Þ where 2 3 hðÞ0 hðÞÀ1 hðÀ2 Þ ÁÁÁ hðÞÀN þ 1 6 7 6 7 6 hðÞ1 hðÞ0 hðÀ1 Þ ÁÁÁ hðÞÀN þ 2 7 6 7 6 7: : H ¼ 6 hðÞ2 hðÞ1 hð0 Þ ÁÁÁ hðÞÀN þ 3 7 ðA 5Þ 6 7 4 ⋮⋮⋮⋱⋮5 hNðÞÀ 1 hNðÞÀ 2 hNðÀ 3 Þ ÁÁÁ hðÞ0

We note that H is a having constant elements along the main and sub-. If two convolving sequences are periodic with period N, their circular convolu- tion is also periodic. In case, hðÞ¼Àn hNðÞÀ n , which results in the and can be expressed as 2 3 hðÞ0 hNðÞÀ 1 hNðÀ 2 Þ ÁÁÁ hðÞ1 6 7 6 7 6 hðÞ1 hðÞ0 hNðÀ 1 Þ ÁÁÁ hðÞ2 7 6 7 6 7: : H ¼ 6 hðÞ2 hðÞ1 hð0 Þ ÁÁÁ hðÞ3 7 ðA 6Þ 6 7 4 ⋮⋮⋮⋱⋮5 hNðÞÀ 1 hNðÞÀ 2 hNðÀ 3 Þ ÁÁÁ hðÞ0

The first column of H is the same as that of the vector h ¼ ½ŠhðÞ0 hð1 Þ ÁÁÁ hNðÞÀ 1 T, and the second column is the same as that of the rotated version of h indexed by one element, such as ½ŠhNðÞÀ 1 hð0 Þ ÁÁÁ hNðÞÀ 2 T. The remaining columns are determined in the same manner. Example A.1: One-Dimensional Shift-Invariant Filtering and the Circulant Matrix Consider a discrete sequencefg 1 2 3454321. Suppose that the corresponding noisy observation is given as x ¼ ½Š1:10 1:80 3:10 4:20 5:10 3:70 3:20 2:10 0:70 T. One simple way to remove the noise is to replace each observed sample by the average of the neighboring samples. If we use an averaging filter that replaces each sample by A Matrix-Vector Representation for Signal Transformation 251 the average of two neighboring samples, plus the sample itself, we have the output y ¼ ½Š1:20 2:00 3:03 4:03 5:00 4:00 3:00 2:00 1:30 T, where the first and the last samples have been computed under the assumption that the input sequence is periodic with period 9, because they are located at a boundary and do not have enough neighboring samples for convolution with the impulse response. The averaging process can be expressed as a one-dimensional time-invariant system whose impulse response is

1 hnðÞ¼ fgδðÞþn þ 1 δðÞþn δðÞn À 1 : ðA:7Þ 3

We can make the corresponding circulant matrix by using the impulse response as 2 3 110ÁÁÁ 1 6 7 6 7 6 111ÁÁÁ 0 7 1 6 7 6 ⋱ 7: : H ¼ 6 011 0 7 ðA 8Þ 3 6 7 4 ⋮⋮⋱⋱⋮5 100ÁÁÁ 1

It is straightforward to prove that y ¼ Hx.

A.2 Two-Dimensional Signals and Systems

In the previous section we obtained the matrix-vector expression of one-dimensional convolution by mapping an input signal to a vector and the impulse response to a Toeplitz or circulant matrix. In a similar manner, we can also represent two-dimensional convolution as a matrix-vector expression by map- ping an input two-dimensional array into a row-ordered vector and the two-dimensional impulse response into a doubly block circulant matrix.

A.2.1 Row-Ordered Vector

Two-dimensional rectangular arrays or matrices usually represent image data. Representing two-dimensional image processing systems, however, becomes too complicated to be analyzed if we use two-dimensional matrices for the input and output signals. Based on the idea that both vectors and matrices can represent the same data, only in different formats, we can represent two-dimensional image data by using a row-ordered vector. 252 A Matrix-Vector Representation for Signal Transformation

Let the following two-dimensional M  N array represent an image 2 3 xðÞ0; 0 xð0; 1 Þ ÁÁÁ xðÞ0, N À 1 6 7 6 ; ; 7 6 xðÞ1 0 xð1 1 Þ ÁÁÁ xðÞ1, N À 1 7 X ¼ 6 7; ðA:9Þ 4 ⋮⋮⋱⋮5 xMðÞÀ 1, 0 xMðÀ 1, 1 Þ ÁÁÁ xMðÞÀ 1, N À 1 which can also be represented by the row-ordered MN  1 vector, such as

 à T x ¼ xðÞ0; 0 xðÞÁÁÁ0; 1 xðÞ0, N À 1 xð1; 0 Þ ÁÁÁ xðÞ1, N À 1 ÁÁÁ xMðÞÁÁÁÀ 1, 0 xMðÞÀ 1, N À 1 : ðA:10Þ

A.2.2 Block Matrices

A space-invariant two-dimensional system is characterized by a two-dimensional impulse response. The output of the system is determined by two-dimensional convolution, expressed as

MXÀ1 XNÀ1 ymðÞ¼; n hmðÞÀ p, n À q xpðÞ; q ; ðA:11Þ p¼0 q¼0 where y(m, n), h(m, n), and x(m, n), respectively, represent the two-dimensional output, the impulse response, and the input signals. Like the one-dimensional case, two-dimensional convolution can also be expressed by matrix-vector multiplication. Example A.2: Two-Dimensional Space-Invariant Filtering and the Block Circulant Matrix Suppose that an N Â N image x(m, n) is filtered by the two-dimensional low-pass filter with impulse response: 8 9 δ δ δ <> ðÞþm þ 1, n þ 1 2 ðÞþm þ 1, n ðÞm þ 1, n À 1 => 1 hmðÞ¼; n þ2δðÞþm, n þ 1 4δðÞþm; n 2δðÞm, n À 1 : ðA:12Þ 16 :> ;> þδðÞþm À 1, n þ 1 2δðÞþm À 1, n δðÞm À 1, n À 1

The output is obtained by two-dimensional convolution as given in Eq. (A.11). We can also express the two-dimensional convolution by multiplying the and the row-ordered vector. If we assume that both the impulse response and the A Matrix-Vector Representation for Signal Transformation 253 input signal are periodic with period N Â N, it is straightforward to prove that the matrix-vector multiplication

y ¼ Hx; ðA:13Þ is equivalent to the two-dimensional convolution, where the row-ordered vector x is obtained as in Eq. (A.10), and the block matrix is obtained as 2 3 H0 HÀ1 0 ÁÁÁ H1 6 7 6 7 6 H1 H0 HÀ1 ÁÁÁ 0 7 1 6 7 6 7: : H ¼ 6 0 H1 H0 ÁÁÁ 0 7 ðA 14Þ 16 6 7 4 ⋮⋮⋮⋱⋮5

HÀ1 00ÁÁÁ H0

Each element in H is again a matrix defined as 2 3 420ÁÁÁ 2 2 3 6 7 6 7 210ÁÁÁ 1 6 242ÁÁÁ 0 7 6 7 6 7 6 121ÁÁÁ 0 7 6 7 6 7 6 7 6 7: H0 ¼ 6 024ÁÁÁ 0 7, and H1 ¼ HÀ1 ¼ 6 012ÁÁÁ 0 7 6 7 6 7 6 7 4 ⋮⋮⋮⋱⋮5 4 ⋮⋮⋮⋱⋮5 100ÁÁÁ 2 200ÁÁÁ 4 ðA:15Þ

Any matrix A whose elements are matrices is called a block matrix, such as 2 3 A0, 0 A0,1 ÁÁÁ A0, NÀ1 6 7 6 7 6 A1, 0 A1,1 ÁÁÁ A1, NÀ1 7 A ¼ 6 7; ðA:16Þ 4 ⋮⋮⋱⋮5

AMÀ1, 0 AMÀ1,1 ÁÁÁ AMÀ1, NÀ1 where Ai,j represents a p  q matrix. More specifically, the matrix A is called an m  n block matrix of basic dimension p  q. If the block structure is circulant, that is, Ai,j ¼ AimodM,jmodN, A is called block circulant. If each Ai,j is a circulant matrix, A is called a circulant block matrix. Finally, if A is both block circulant and circulant block, A is called doubly block circulant. 254 A Matrix-Vector Representation for Signal Transformation

A.2.3 Kronecker Products

If A and B are M1 Â M2 and N1 Â N2 matrices, respectively, their is defined as 2 3 aðÞ0; 0 B ÁÁÁ aðÞ0, M2 À 1 B 6 7 A B 4 ⋮⋱ ⋮ 5; ðA:17Þ

aMðÞ1 À 1, 0 B ÁÁÁ aMðÞ1 À 1, M2 À 1 B which is an M1 Â M2 block matrix of basic dimension N1 Â N2. Kronecker products are useful in generating high-order matrices from low-order matrices. For an N Â N image, X, a separable operation, where A operates on the columns of X and B operates on the rows of the result, can be expressed as

Y  AXBT ðA:18Þ or

XNÀ1 XNÀ1 ymðÞ¼; n amðÞ; p xpðÞ; q bnðÞ; q : ðA:19Þ p¼0 q¼0

In addition, Eqs. (A.18) and (A.19) are equivalent to

y ¼ ðÞA B x; ðA:20Þ where y and x, respectively, represent the row-ordered vectors of y(m, n)and x(m, n). Example A.3: Two-Dimensional Extension Using the Kronecker Product Consider an N Â N image, denoted by the two-dimensional matrix X as shown in Eq. (A.9). If we apply a one-dimensional averaging filter, given in Eq. (A.7), on each column of X, and apply the same filter on each row of the resulting matrix, then we can obtain a two-dimensional averaging filtered image. Let y(m, n) be the output of the two-dimensional averaging filter and h(m, n) the element of matrix H, which represents the circulant matrix for the one-dimensional system given in Eq. (A.8). Then we have that

XNÀ1 XNÀ1 ymðÞ¼; n hmðÞ; p xpðÞ; q hnðÞ; q ; ðA:21Þ p¼0 q¼0 A Matrix-Vector Representation for Signal Transformation 255 which is equivalent to

Y ¼ HXHT: ðA:22Þ

Let x and y respectively represent row-ordered vectors for x(m, n) and y(m, n), as shown in Eq. (A.10). Then Eq. (A.22) is equivalent to the following matrix-vector multiplication:

y ¼ ðÞH H x: ðA:23Þ Appendix B Discrete

Continuous-time Fourier transform methods are well known for analyzing fre- quency characteristics of continuous-time signals. In addition, the inverse transform provides perfect original signal reconstruction. More specifically, an arbitrary frequency component in the signal can be extracted by forming the inner product of the signal with the corresponding sinusoidal function. Due to the orthog- onality property of sinusoidal basis functions, each frequency component, which is called the Fourier transform coefficient, exclusively contains the desired frequency component. At the same time, the inverse transform can be performed using the exact procedure of the forward transform, except that the complex conjugated basis function is used. Fourier transforms provide a very powerful tool for the mathematical analysis and synthesis of signals. For continuous signals, the work can be performed using paper and pencil calculations. However, in image processing (and many other fields) where signals are digitized and the resulting arrays are large scale, we find that the computations become digital, and thus the discrete Fourier transform (DFT) is used. The advantage here is that the DFT work can easily be performed with the digital computer using well-known DFT methods along with FFT algorithms. In this appendix, the basic material needed for applying the DFT and IDFT to discrete signals is presented.

B.1 One-Dimensional Discrete Fourier Transform

The unitary DFT of a sequence fgunðÞ, n ¼ 0, ..., N À 1 is defined as

XNÀ1 1 kn vkðÞ¼pffiffiffiffi unðÞWN , k ¼ 0, 1, ..., N À 1; ðB:1Þ N n¼0

© Springer International Publishing Switzerland 2016 257 M.A. Abidi et al., Optimization Techniques in Computer Vision, Advances in Computer Vision and Pattern Recognition, DOI 10.1007/978-3-319-46364-3 258 B Discrete Fourier Transform where  2π W ¼ exp Àj : ðB:2Þ N N

Let the matrix F1 be defined as follows: 2 3 0 0 0 0 WN WN WN ÁÁÁ WN 6 7 6 0 1 2 NÀ1 7 6 WN WN WN ÁÁÁ WN 7 1 6 7 ffiffiffiffi 6 0 2 4 2ðÞNÀ1 7: : F1 ¼ p 6 WN WN WN ÁÁÁ WN 7 ðB 3Þ N 6 7 4 ⋮⋮ ⋮⋱⋮5 0 NÀ1 2ðÞNÀ1 ðÞNÀ1 2 WN WN WN ÁÁÁ WN

Then, Eq. (B.1) can be expressed in matrix-vector representation as

v ¼ F1u; ðB:4Þ where v and u represent vectors whose elements take values of the discrete signals v(k) and u(n), respectively. The inverse transform is given by

XNÀ1 1 Àkn unðÞ¼pffiffiffiffi vkðÞWN , n ¼ 0, 1, ..., N À 1: ðB:5Þ N n¼0

nk Because the complex exponential function WN is orthogonal and F1 is sym- metric, we have that

À1 *T * F1 ¼ F1 ¼ F1 ; ðB:6Þ where the superscript * represents the conjugate of a complex matrix. According to Eq. (B.6), the inverse transform can be expressed in vector-matrix form as

* u ¼ F1 v: ðB:7Þ

The DFT of Eq. (B.1), and the inverse expression of Eq. (B.5), form a transform pair. This transform pair has the property of being unitary because the matrix F1 is unitary. In other words, the conjugate of F1 is equal to its inverse, as expressed in Eq. (B.6). In many applications, differently scaled DFTs and their inverses may be used. In such cases, most properties, which will be summarized in the next section, hold with proper adjustment of a scaling factor. B Discrete Fourier Transform 259

B.2 Properties of the DFT

B.2.1 Periodicity

The extensions of the DFT and its inverse transform are periodic with period N.In other words, for every k,

vkðÞ¼þ N vkðÞ; ðB:8Þ and for every n,

unðÞ¼þ N uNðÞ: ðB:9Þ

B.2.2 Conjugate Symmetry

The DFT of a real sequence is conjugate symmetric about N/2, where we assume N is an even number.ÀÁ By applying the periodicity of the complex exponential, such N 2π as WN ¼ exp Àj N Á N ¼ 1, we obtain

v*ðÞ¼N À k vkðÞ: ðB:10Þ

... N From Eq. (B.10), we see that, for k ¼ 0, , 2 À 1,  N N v À k ¼ v* þ k ; ðB:11Þ 2 2 and  

N N v À k ¼ v þ k : ðB:12Þ 2 2

According to the conjugate symmetry property, only N/2 DFT coefficients completely determine the frequency characteristics of a real sequence of length N. More specifically, the N real sequence

vðÞ0 ,RefgvðÞ1 ,ImfgvðÞ1 , ...,RefgvNðÞ=2 À 1 ,ImfgvNðÞ=2 À 1 , vNðÞ=2 ðB:13Þ completely defines the DFT of the real sequence. It is clear from Eq. (B.1) that v(0) is real and from Eq. (B.11) that v(N/2) is real. 260 B Discrete Fourier Transform

B.2.3 Relationships Between DFT Basis and Circulant Matrices

The basis vectors of the DFT are the orthonormal eigenvectors of any circulant matrix. The eigenvalues of a circulant matrix are the DFT of its first column. Based on these properties, the DFT is used to diagonalize any circulant matrix. To prove that the basis vectors of the DFT are the orthonormal eigenvectors of any circulant matrix, we must show that

ϕ λ ϕ ; : H k ¼ k k ðB 14Þ

*T * where H represents a circulant matrix ϕk, the k-th column of the matrix F1 ¼ F1 , and λk the corresponding eigenvalue. Since H is circulant, it can be represented as 2 3 h0 hNÀ1 hNÀ2 ÁÁÁ h1 6 7 6 7 6 h1 h0 hNÀ1 ÁÁÁ h2 7 6 7 6 7: : H ¼ 6 h2 h1 h0 ÁÁÁ h3 7 ðB 15Þ 6 7 4 ⋮⋮⋮⋱⋮5

hNÀ1 hNÀ2 hNÀ3 ÁÁÁ h0

*T * The k-th column of the matrix F1 ¼ F1 is represented as ÂÃ 1 T ϕ ¼ pffiffiffiffi W 0 W Àk ... W ÀðÞNÀ1 k : ðB:16Þ k N N N N

From Eqs. (B.15) and (B.16), the m-th element of Hϕk in (B.14) is obtained as

XNÀ1 T ϕ 1ffiffiffiffi Àkn; : em ½Š¼H k p hmðÞÀ n WN ðB 17Þ N n¼0

T where em represents the m-th unit vector, for example, e1 ¼ ½Š10... 0 .By changing variables and using the periodicity of the complex exponential function WN, we can rewrite Eq. (B.17)as ! XNÀ1 T ϕ kl 1ffiffiffiffi Àkm ; : em ½Š¼H k hlðÞWN p WN ðB 18Þ l¼0 N which results in Eq. (B.14). The eigenvalues of the circulant matrix H are defined as

XNÀ1 kl λk ¼ hlðÞWN ,fork ¼ 0, 1, ..., N À 1: ðB:19Þ l¼0 B Discrete Fourier Transform 261

From Eq. (B.19), we see that the eigenvalues of a circulant matrix are the DFT of its first column. Furthermore, since Eq. (B.14) holds for k ¼ 0, 1, ..., N À 1, we can write ϕ ϕ ϕ ϕ ϕ ϕ Λ; : H½Š¼0 1 ÁÁÁ NÀ1 ½Š0 1 ÁÁÁ NÀ1 ðB 20Þ

Λ λ λ λ ϕ ϕ ϕ * where ¼ diagfg0 1 ÁÁÁ NÀ1 . We note that ½Š¼0 1 ÁÁÁ NÀ1 F1 . Then expression Eq. (B.20) reduces to

* * HF1 ¼ F1 Λ: ðB:21Þ

By multiplying F1 to the left hand side of Eq. (B.21), we obtain

* F1HF1 ¼ Λ: ðB:22Þ

Equation (B.22) shows that any circulant matrix can be diagonalized by multi- plying the DFT matrix and its conjugate on the left and the right-hand sides, respectively.

B.3 Two-Dimensional Discrete Fourier Transform

The two-dimensional DFT of an N Â N sequence u(m, n) is defined as

XNÀ1 XNÀ1 1 km nl vkðÞ¼; l umðÞ; n WN WN ,for k, l ¼ 0, 1, ..., N À 1; ðB:23Þ N m¼0 n¼0 and the inverse transform is

1 XNÀ1 XNÀ1 umðÞ¼; n vkðÞ; l W ÀkmW Ànl,for m, n ¼ 0, 1, ..., N À 1: N N N k¼0 l¼0 ðB:24Þ

There are two major categories of the two-dimensional DFT applications, such as (a) two-dimensional spatial frequency analysis and filtering and (b) diagonalization of block circulant matrices for efficient computation of two-dimensional convolution. 262 B Discrete Fourier Transform

B.3.1 Basis Images of the Two-Dimensional DFT

In order to analyze spatial frequency characteristics of a N Â N two-dimensional signal, we consider N2 basis images of the same size N Â N. If each basis image contains unique two-dimensional spatial frequency components exclusively, we can compute the desired frequency component in the given image by finding the inner product of the given image and the corresponding basis image. If we form N2 basis images by taking the of two vectors that are permutations from ϕ ... fgk, k ¼ 0, , N À 1 , defined in Eq. (B.16), each inner product of the given image and the corresponding basis image is equal to the two-dimensional DFT. In matrix notation, Eqs. (B.23) and (B.24), respectively, become

V ¼ F1UF1 ðB:25Þ and * * U ¼ F1 VF1 ; ðB:26Þ where U and V represent N Â N matrices whose elements are mapped to two-dimensional signals u(m, n) and v(k, l), respectively. The two-dimensional DFT basis images are given by

ϕ ϕ T ... : : Bkl ¼ k l ,for k, l ¼ 0, , N À 1 ðB 27Þ

We note that Eq. (B.25) is mathematically equivalent to the set of inner products of the given image and basis images defined in Eq. (B.27). Given two-dimensional DFT coefficients, which are elements of V, the two-dimensional signal can be reconstructed by summation of all basis images weighted by the given DFT coefficients. This reconstruction process is mathemat- ically equivalent to Eq. (B.26).

B.3.2 Diagonalization of Block Circulant Matrices

Let u and v be the lexicographically ordered vectors for two-dimensional signals u(m, n) and v(k, l), respectively. The two-dimensional DFT matrix is defined as

F ¼ F1 F1; ðB:28Þ where represents the Kronecker product of matrices and F1 represents the one-dimensional DFT matrix. According to the property of the Kronecker product, we see that the two-dimensional DFT matrix is also symmetric, that is,

T T T T F ¼ ðÞF1 F1 ¼ F1 F1 ¼ F1 F1 ¼ F: ðB:29Þ B Discrete Fourier Transform 263

The two-dimensional DFT is written in matrix-vector form as

v ¼ Fu: ðB:30Þ

In order to investigate the of the two-dimensional DFT matrix, we have ÀÁ *T * * * * F F ¼ F F ¼ ðÞF1 F1 ðÞ¼F1 F1 F1 F1 ðÞF1 F1 ÀÁ ðB:31Þ * * : ¼ F1 F1 F1 F1 ¼ IN IN ¼ IN2ÂN2

From Eq. (B.31), we know that

F*T ¼ F* ¼ FÀ1; ðB:32Þ which yields the following inverse transform

u ¼ F*v: ðB:33Þ

We note that the matrix- of the two-dimensional DFT in Eq. (B.30) is equivalent to both Eqs. (B.23) and (B.25) and that Eq. (B.33)is equivalent to both Eqs. (B.24) and (B.26). Consider the two-dimensional circular convolution

XNÀ1 XNÀ1 ; ; ; : ymðÞ¼n hmðÞÀ p, n À q C xpðÞq ðB 34Þ p¼0 q¼0 where

; : : hmðÞn C ¼ hmðÞmod N, n mod N ðB 35Þ

The reason for using the circular convolution in image processing is to deal with boundary problems. In other words, when processing an image with a convolution , boundary pixels do not have a sufficient number of neighboring pixels. In order to compensate for this pixel shortage, we may use one of the following: (1) assign zero values to nonexistent neighbors, or (2) replicate the value of the outermost existing pixel to its nonexistent neighbors, or (3) suppose that the input image is periodic with a period that is the same as the size of the image. Although none of the three methods give us the ideal solution for boundary problems, the third method that assumes two-dimensional periodicity allows us to use circular convolution, which can be diagonalized by using a two-dimensional DFT. 264 B Discrete Fourier Transform

Given ( p, q), we can obtain the two-dimensional DFT of hmðÞÀ p, n À q C as

ÈÉXNÀ1 XNÀ1 mkþnl: : DFT hmðÞÀ p, n À q C ¼ hmðÞÀ p, n À q CWN ðB 36Þ m¼0 n¼0

Writing i ¼ m À p, j ¼ n À q, and using Eq. (B.35), we can rewrite Eq. (B.36)as

NXÀ1Àp NXÀ1Àq XNÀ1 XNÀ1 pkþql ; ikþjl pkþql ; mkþnl WN hiðÞj CWN ¼ WN hmðÞn WN i¼Àp i¼Àq m¼0 n¼0 pkþql ¼ WN DFTfghmðÞ; n : ðB:37Þ

Since Eq. (B.36) holds for p, q ¼ 0, ..., N À 1, the right side of Eq. (B.36) can be expressed as

ðÞF1 F1 hp,q; ðB:38Þ

2 where hp,q, represents the N Â 1 row-ordered vector obtained from the rotated version of the N Â N matrix {h(m, n)} by ( p, q). By equating Eqs. (B.38) and (B.37), for p, q ¼ 0, ..., N À 1, we obtain

ðÞF1 F1 H ¼ DFðÞ1 F1 ; ðB:39Þ where H represents theN2 Â N2 doubly block circulant matrix, and D the whose N2 diagonal elements are equal to the two-dimensional DFT of h(m, n). Equation (B.39) can be rewritten as

FH ¼ DF or FHFÀ1 ¼ FHF* ¼ D; ðB:40Þ which shows that a doubly block circulant matrix is diagonalized by the two-dimensional unitary DFT. Furthermore, if a double block circulant matrix is used to represent the two-dimensional circular convolution, such as

y ¼ Hx; ðB:41Þ then the two-dimensional DFT of the output can be obtained by multiplying the DFT matrix as

Fy ¼ FHx ¼ FHF*Fx ¼ D Á Fx: ðB:42Þ B Discrete Fourier Transform 265

Equation (B.42) can be rewritten as

DFTfgymðÞ; n ¼ DFTfghmðÞ; n Á DFTfgxmðÞ; n : ðB:43Þ

After having the DFT of the output, its inverse transform can easily be obtained by multiplying the conjugate of the two-dimensional DFT matrix. Appendix C 3D Data Acquisition and Geometric Surface Reconstruction

C.1 Introduction

The first step in building the 3D model of a real scene is the acquisition of raw data. For this purpose, a common approach needs to acquire depth information from a given point of view. Two major depth acquisition methods include: • Stereovision, which uses classic photography from two or more viewpoints in order to retrieve the third dimension and build depth maps [ayache97, faugeras93, horn86]. • Direct range acquisition, which directly uses range finding devices, also called 3D scanners. The time-of-flight and active laser triangulation methods fall into this category. For the experimental input data, a time-of-flight laser range finder (LRF), Perceptron LASAR P5000 [dorum95, perceptron93], was used. The depth is retrieved by computing the phase shift between an outgoing laser beam and its returned (bounced back) signal. This kind of imaging is also known as light amplitude detection and ranging (LADAR). Perceptron is an azimuth-elevation scanner, which uses two kinds of rotating mirrors as shown in Fig. C.1. A faceted mirror controls the horizontal displacement of the laser beam and a second planar mirror controls the vertical deflection. The LRF is then able to scan a scene point by point, generating a range map in which a pixel represents the distance value of a scene point from the scanner. The scanner has some limitations like a maximum distance beyond which it could differentiate the range and the horizontal and vertical fields of view. In addition to the range image, the LRF outputs a reflectance image based on the intensity of the returned laser beam as shown in Fig. C.2. This image is perfectly registered with the range image with pixel by pixel correspondence and will be useful later in the registration of LADAR and color data.

© Springer International Publishing Switzerland 2016 267 M.A. Abidi et al., Optimization Techniques in Computer Vision, Advances in Computer Vision and Pattern Recognition, DOI 10.1007/978-3-319-46364-3 268 C 3D Data Acquisition and Geometric Surface Reconstruction

Y α

Nodding mirror X

r3 β Z

α

h2 r2

r1

Laser source h1

Faceted rotating mirror Each pixel in a Perceptron range image, denoted by r(i,j), 0 ≤ i< R (rows) and 0 ≤ j < C (columns), is converted to Cartesian coordinates as follows: x(i, j) = dx + r .sinα R −1 3 ( − i) y(i, j) = dy + r .cosα.sin β β = 2 .V 3 R z i j = dz + r α β ( , ) 3 .cos .cos C −1 ( − j) r = (dx − h ) δ 1 2 α = 2 .H C r = ( (dx) 2 + (h + dy) 2 ) δ dz = (h + dy).tanα 2 2 2 β dy = dz θ + r = (r(i, j) + r − (r + r )).δ .tan( ) 3 0 1 2 2 dz = −h − α γ 1.(1 cos ) tan

H and V respectively represent the horizontal and vertical fields of view; r0 the standoff distance (length of the laser beam at the point where r = 0), γ the slope of the facets of the rotating mirror with respect to the z-axis, and θ the angle of the nodding mirror with respect to the z-axis when β = 0.

Fig. C.1 Principle of 3D image acquisition using Perceptron [dorum95, perceptron93]

It is important to notice that the range map is a partial 3D representation since it allows only for surface reconstruction from a given point of view. In order to build a more complete representation of an object, we need more than one range image; 1 thus, this kind of data is known as 22-dimensional data.

C.2 From Image Pixels to 3D Points

To reconstruct the 3D Cartesian coordinates of the scanned scene points from the range image, we use the Perceptron LRF imaging model [dorum95], which includes the different parameters allowing the retrieval of scene points position in the LRF coordinates frame. Figure C.1 shows the coordinate systems attached to the scanner C 3D Data Acquisition and Geometric Surface Reconstruction 269

Fig. C.2 Two perfectly registered outputs of Perceptron: (a) range and (b) reflectance images

Fig. C.3 Spherical model Y for the reconstruction of 3D j points from a range image P a R Z i b r(i,j)

range image X and the different model parameters with the equations allowing for the calculation of the x, y, and z coordinates of a given scene point. A simple spherical model is shown in Fig. C.3, based on approximation of the complete model. A pixel at the (i, j)-th position in a range image has as intensity value r(i, j). The coordinates of a point are calculated from the pixel value as

; riðÞj ; : R ¼ r0 þ δ ðC 1Þ where r0 represents a standoff distance, or an offset, and δ represents the range resolution. Both r0 and δ can be obtained through calibration. The azimuth and elevation angles α and β are given by ÀÁ ÀÁ CÀ1 À j RÀ1 À i α ¼ 2 Á H and β ¼ 2 Á V; ðC:2Þ C R where H and V, respectively, represent the horizontal and vertical fields of view of the same scanner. The x, y, and z coordinates are finally calculated as

xiðÞ¼; j R sin α, yiðÞ¼; j R cos α sin β, andziðÞ¼; j R cos α cos β: ðC:3Þ 270 C 3D Data Acquisition and Geometric Surface Reconstruction

C.3 Surface Reconstruction

The purpose of surface reconstruction is to fit surfaces to the 3D point’s cloud reconstructed from the range image. The visualization standards require the use of polygons as basic elements composing the object surface. The most simple and common polygon used is the triangle; thus, the process of creation of surface model is also called triangulation. Different techniques were used for triangulation [elhakim98, sequira99]. Figure C.4 shows a simple approach of creating a triangle mesh from a range image, which is considered as a 2D grid. The first step is the creation of a rectangular grid. Every four neighbors in the grid correspond to four neighbors in the range image. The quadratic mesh is then transformed into a triangle mesh by dividing the different rectangles into two triangles according to simple rules. The resulting triangle mesh built from range images could be visualized using different software standards, such as VRML and OpenInventor of SGI. Some other hardware specifications could be required to get optimal visualiza- tion performances, particularly in the case of large models, such as those we are dealing with in this work. The number of model triangles is ðÞÂC À 1 ðÞR À 1 ; for example, a model built from a 1000 Â 1000 range image contains around two million triangles. Hence, it is, in many cases, suitable to reduce the number of triangles in the mesh [gourley98]. This is done in such a way as to keep a maximum number of triangles in the scene areas with a high level of detail and a minimum number of triangles in areas with large flat surfaces. Other steps in model building from range images include 3D segmentation, smoothing, and other preprocessing. In addition to these, the 3D models are usually rendered with textures on top of them. The 3D visualization engine also requires the calculation of surface or vertex normals, as shown in Fig. C.5, in order to determine the shading, using the lighting and camera models in a given viewpoint. Figure C.6 shows an example of a model reconstructed from range images. The model is rendered using OpenInventor, and we can see the rendered model with a uniform texture and also the triangle mesh forming the skeleton of the scene model.

Fig. C.4 3D triangle mesh reconstructed from a 2D grid representing range data

2D grid 3D triangle mesh C 3D Data Acquisition and Geometric Surface Reconstruction 271

Fig. C.5 Terminology Vertex used for the different Edge features of a triangle mesh

Face Surface normal Vertex normal Mesh

Fig. C.6 A 3D model built from the previous range image with other views: (a) the model is rendered with a uniform texture and (b) a rendering of the wireframe

References

[ayache97] N. Ayache, Artificial Vision for Mobile Robots: Stereo-Vision and Multi-Sensory Perception (MIT Press, Cambridge, MA, 1997) [dorum95] O.H. Dorum,€ A. Hoover, J.P. Jones, Calibration and Control for Range Imaging in Mobile Robot Navigation, in Research in Computer and Robot Vision, ed. by C. Archibald, P. Kwok (World Scientific, Singapore, 1995), pp. 1–18 [elhakim98] S.F. El-Hakim, C. Brenner, G. Roth, An Approach to Creating Virtual Environments using Range and Texture. ISPRS J. Photogramm. Remote Sens. 53, 379–391 (1998) [faugeras93] O. Faugeras, Three-Dimensional Computer Vision (MIT Press, Cambridge, MA, 1993) [gourley98] C. Gourley, Pattern vector based reduction of large multi-modal datasets for fixed rate interactivity during the visualization of multi-resolution models, Ph.D. Thesis, Uni- versity of Tennessee, Knoxville, 1998 [horn86] B.K.P. Horn, Robot Vision (MIT Press, Cambridge, MA, 1986) [perceptron93] Perceptron Inc, LASAR Hardware Manual, 23855 Research Drive, Farmington Hills, Michigan 48335, 1993. [sequira99] V. Sequira, E. Wolfart, J.G.M. Gonclaves, D. Hogg, Automated Reconstruction of 3D Models from Real Environments. ISPRS J. Photogramm. Remote Sens. 54,1–22(1999) Appendix D Mathematical Appendix

D.1 Functional Analysis

D.1.1 Real Linear Vector Spaces

Real linear is a set V of elements (objects) x, y, z... for which operations of addition and multiplication by real numbers are defined satisfying the following nine axioms: 1. x þ y2V. 2. α Á x2V, α2R. 3. x þ y ¼ y þ x. 4. x þ ðÞ¼y þ z ðÞþx þ y z. 5. x þ y ¼ x þ z iff y ¼ z. 6. αðÞ¼x þ y α Á x þ α Á y. 7. ðÞα þ β x ¼ α Á x þ β Á x, β 2R. 8. α Á ðÞ¼β Á x ðÞÁα Á β x. 9. 1 Á x ¼ x where α, β 2R.

Examples

1. The real numbers themselves with the ordinary operations of arithmetic ÀR. N 2. The set of ordered real N-tuples (x1, x2,..., xN), or N-dimensional vectors ÀR . 3. The set of all functions continuously differentiable to order n on the real interval ½ŠÀa; b Cn½Ša; b .

© Springer International Publishing Switzerland 2016 273 M.A. Abidi et al., Optimization Techniques in Computer Vision, Advances in Computer Vision and Pattern Recognition, DOI 10.1007/978-3-319-46364-3 274 D Mathematical Appendix

D.1.2 Normed Vector Spaces

Real linear vector space equipped with the measure of the size of its elements which satisfies the conditions: 1. kkx ¼ 0iffx ¼ 0 2. kα Á xk ¼ jjα Á kkx 3. kkx þ y  kkx þ kky is called normed vector space. A norm of an element is a real number. Examples 1. Absolute value of a real number |x| is a norm. qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiÀÁ 2 2 ... 2 2. Euclidean norm of a vector kkx 2 ¼ x1 þ x2 þ þ xN . 3. Functional norms for continuously differentiable functions x(t), a  t  b:

(a) kkx 1 ¼ sup jjxtðÞ. atb ðb

(b) kkxtðÞ 1 ¼ jjxtðÞdt. a 8 9 = <ðb =1 2 2 (c) kkxtðÞ 2 ¼ : jjxtðÞ dt; . a 8 9 = <ðb =1 p p (d) kkxtðÞ p ¼ : jjxtðÞ dt; . a

D.1.3 Convergence, Cauchy Sequences, Completeness

An infinite sequence of elements x1, x2, x3,... in a normed vector space is said to converge to an element y if as k !1, kkxk À y ! 0. As the sequence converges, elements of the sequence tend to get closer and closer.

kkxn À xm ! 0asn, m !1: ðD:1Þ

Such sequences are called Cauchy sequences. Every convergent sequence is Cauchy. If every Cauchy sequence converges to a limit that is an element of the original vector space, then such space is called complete. Not every normed linear space is complete. Complete normed linear spaces are called Banach spaces. The sequence may be convergent under one norm but not under another; hence, a normed vector space may be complete under one norm but not under another. Convergence under norm a. is the most important and is called uniform convergence. D Mathematical Appendix 275

D.1.4 Euclidean Spaces and Hilbert Spaces

As a norm is an abstraction of the size of an element in a real linear vector space, the inner (, dot) product is a generalization of the angle between two elements in a real linear vector space. The inner product is a rule (Á) which assigns to any elements of a real linear vector space a real number. This rule should have the following four properties: 1. (x, y) ¼ (y, x). 2. (α∙x, y) ¼ α∙(x, y). 3. (x + y, z) ¼ (x, z)+(y, z). 4. (x, x) > 0, if x 6¼ 0. The inner product induces a natural norm

= kkx ¼ ðÞx; x 1 2: ðD:2Þ

A linear vector space equipped with an inner product and an induced norm is called Euclidean space or pre-Hilbert. A Euclidean space which is complete under an induced inner product norm is called Hilbert space.

Examples XN N ; 1. Vector scalar product in R , ðÞ¼x y xi Á yi with the induced norm i¼1 "#= XN 1 2 ; 1=2 2 kkx ¼ ðÞx x ¼ xi . i¼1 ðb 2. Scalar product in Cn [a, b], ðÞ¼xtðÞ, ytðÞ xtðÞÁytðÞdt with the induced norm a 2 3 = ðb 1 2 kkxtðÞ ¼ 4 ½ŠxtðÞ25 . a

D.1.5 Approximations in Hilbert Spaces, Fourier Series

Two vectors x, y in a Hilbert space are said to be orthogonal if (x, y) ¼ 0. A system of elements e1, e2,..., eN ... is called an orthonormal system of elements in a Hilbert space if ( ÀÁ 1ifi ¼ j ei; ej ¼ : ðD:3Þ 0if i 6¼ j 276 D Mathematical Appendix

· || || Linear Vector Space (·,·) Normed Linear Vector Space Completion

Pre-Hilbert or Euclidean space Banach space

Hilbert space Completion under scalar product

Fig. D.1 General hierarchy of functional spaces

An element x of a Hilbert space can be represented as a of ei as X1 x ¼ ðÞÁx; ei ei, where (x, ei) are called Fourier coefficients of x and the series i¼1 itself is called the Fourier expansion of x. The general hierarchy of functional spaces is presented in Fig. D.1.

D.1.6 Operators and Their Norms

Let V and U be two vector spaces. Then transformation A (mapping, rule) which assigns to each element in V a unique element in U, A : V ! U is called an operator. An operator A is called linear if

AðÞ¼α Á x þ β Á y α Á A Á x þ β Á A Á y; ðD:4Þ for all x and y2V and for all real scalars α and β. D Mathematical Appendix 277

Examples d d2 dn 1. Differential operators, xtðÞ, xtðÞ, ..., xtðÞ. dt ð dt2 dtn 2. Indefinite integral operator xtðÞdt. ð b 3. Convolution or Fredholm integral operator ytðÞ¼ KtðÞÁ; τ xðÞτ dτ. a A type 3 operator is of the utmost importance in image processing. Normally, if a mapping carries a function to a function, it is called an operator; if it carries a function to a number, it is called functional; and if it carries a number to a number, it is called a function. Operator A is called bounded if there is a constant K such that

kkAx  Kxkk, x2V: ðD:5Þ

The norm of the operator is then defined as

kkAx kkA ¼ sup : ðD:6Þ x6¼0 kkx

D.1.7 Vector and Matrix Norms

Let x be an N-dimensional vector x ¼ (x1, x2,..., xN). Then the following norms can be defined:

XN 1. kkx 1 ¼ jjxi . i¼1 "#= XN 1 2 2 2. kkx 2 ¼ xi . i¼1 "#= XN 1 p p 3. kkx p ¼ xi . i¼1 1=2 4. kkx W ¼ ðÞx Á W Á x . 5. kkx ¼ max jjxi . 1 i Let A be an ðÞm  n matrix. Then following norms can be defined: "# Xm

1. kkA ¼ max ai, j . 1 j "#i¼1 Xn

2. kkA ¼ max ai,j . 1 i j¼1 278 D Mathematical Appendix

ÂÃÀÁ T 1=2 3. kkA 2 ¼ maxeigenvalueof A A —spectral norm. "#= Xm Xn 1 2 2 4. kkA F ¼ ai,j —Frobenius norm. i¼1 hij¼1 kkAÁx 5. kkA E ¼ max x —Euclidean norm. kk¼x 1 kk

D.1.8 SVD, Eigenvalues and Eigenvectors, Condition Number

Every ðÞm  n matrix A can be factored into A ¼ U Á Σ Á VT, where UT Á U ¼ VT ÁV ¼ I and Σ ¼ diagðÞσ1; σ2; ...; σn . 2 3   6 σ1 7 T 6 7 A ¼ u1, u2, ..., um Á 4 ⋱ 5 Á v1, v2, ..., vn : ðD:7Þ mÂm nÂn σn mÂn

Columns of matrices U and V are called left and right singular vectors of A, and vector (σ1, σ2, ..., σn) is called a vector of singular values which appear in nonincreasing order. The condition number of a matrix A is with respect to a norm is defined as

condðÞ¼A kkA Á AÀ1 : ðD:8Þ

If a spectral norm is used, then the condition number is given by a ratio σ condðÞ¼A max ; ðD:9Þ σmin where σmax and σmin are maximum and minimum singular values of matrix A. For a nonsingular system of linear equations Ax ¼ b, the condition number defines a possible relative change Δx in the solution x due to a given relative change Δb in the right-hand side vector b, namely,

kkΔx kkΔb  condðÞÁA : ðD:10Þ kkx kkb

Hence, if cond(A) is large then the relative change in the solution could be very large even for small changes in the right-hand side of the equation. D Mathematical Appendix 279

D.2 Matrix Algebra and the Kronecker Product

For the matrices A, B, and C, the following theorems are valid: 1. A + B ¼ B + A. 2. (A + B)+C ¼ A+(B + C). 3. (AB)C ¼ A(BC). 4. C(A + B) ¼ CA + CB. 5. α(A + B) ¼ αA + αB, α2R. 6. (AT)T ¼ A. 7. (A + B)T ¼ AT + BT. 8. (AB)T ¼ BTAT. 9. (ABC)T ¼ CTBTAT. 10. (AB)À1 ¼ BÀ1AÀ1. 11. (ABC)À1 ¼ CÀ1 BÀ1AÀ1. 12. (AÀ1)À1 ¼ A. 13. (AT)À1 ¼ (AÀ1)T. The Kronecker product of two matrices A, ðÞm  n , and B, ðÞk  l , is the block matrix 2 3 a11 Á B ÁÁÁ a1n Á B 6 7 A B ¼ 4 ⋮⋱⋮5, ðÞmk  ml ; ðD:11Þ

am1 Á B ÁÁÁ amn Á B with the following properties:

1. ðÞA B T ¼ AT BT. 2. ðÞA B À1 ¼ AÀ1 BÀ1. 3. ðÞ A B C ¼ A ðÞB C . 4. ðÞ A þ B C ¼ A C þ B C. T 5. ðÞ¼A B ðÞÁUA UB ðÞÁΣA ΣB ðÞVA VB . Operation vec(A) is defined as

T vecðÞ¼A ½Ša11; ...; am1; a12; ...; am2; ...; a1n; ...; amn ; ðD:12Þ where vec(A)isan(mn  1) vector—the stacked columns of A. Let

A Á X ¼ B ðD:13Þ represent a matrix equation with A, ðÞm  n ; X, ðÞn  k ; and B, ðÞm  k . Then Eq. (D.13) can be written in equivalent form as

ðÞÁI A vecðÞ¼X vecðÞB ; ðD:14Þ 280 D Mathematical Appendix and the solution to this equation can be expressed as

vecðÞ¼X ðÞI A À1 Á vecðÞB : ðD:15Þ

The following important relationship exists between the Kronecker product and the operation vec: ÀÁ ðÞÁA B vecðÞ¼X vec BXAT : ðD:16Þ

This relationship is extensively used in image processing. The convolution of an image with separable point spread function can be represented as

ðÞÁK1 K2 vecðÞ¼X vecðÞB ; ðD:17Þ where X is the true image, K1 and K2 are separable kernels, and B is the blurred image.

D.2.1 Derivatives and

Let x ¼ (x1, x2,..., xn) and y ¼ (y1, y2,..., yn) be two vectors and A an (m  n) matrix. Then

∂ T 1. ∂x ðÞ¼x y y. ∂ T 2. ∂x ðÞ¼x x 2x. ∂ T 3. ∂x ðÞ¼x Ay Ay. 4. ∂ ðÞ¼yTAx ATy. ∂x ÀÁ ∂ T T 5. ∂x ðÞ¼x Ax A þ A x. ∂ T T 6. ∂A ðÞ¼x Ay xy . ∂ T T T 7. ∂A ðÞ¼x Ax 2xx À diagðÞxx . Assume A and B are (n  n) matrices, and α and β are two scalars. Then

Xn traceðÞA ¼ aii ðD:18Þ i¼1 with the following properties: 1. tr(AB) ¼ tr(BA). Xn 2. trðÞ¼A σi. i¼1 D Mathematical Appendix 281

3. trðÞ¼ÀÁαA þ βB αtrðÞþA βtrðÞB . 4. tr AT ¼ trðÞA .

D.3 Probability and

The random variable is a real-valued function which is defined on the space of random events (outcomes of experiments). The same space of random events may define an infinite number of random variables. Examples 1. In the die experiment, we can assign the six outcomes their corresponding values, thus f(1) ¼ 1, f(2) ¼ 2,..., f(6) ¼ 6. 2. In the same experiment we can assign number 10 Á i to the i-th outcome; thus f(1) ¼ 10, f(2) ¼ 20,..., f(6) ¼ 60. 3. Finally, in the same experiment we can assign number 1 to every even outcome and 0 to every odd outcome; thus f(1) ¼ f(3) ¼ f(5) ¼ 0 and f(2) ¼ f(4) ¼ f(6) ¼ 1. All these three random variables are defined on the same sample space. A random variable can assume discrete, continuous, or both values. The prob- ability distribution function of a random variable x is the function defined for every X such that

FxðÞ¼X PxðÞ X : ðD:19Þ

The derivative of a probability distribution function is called a probability density function and is denoted by dFðÞ x fxðÞ¼ : ðD:20Þ dx

The Gaussian or normal probability density function can be written as

1 ÀðÞxÀμ 2 fxðÞ¼ pffiffiffiffiffi e 2σ2 ; ðD:21Þ σ 2π where σ and μ are two parameters which define the shape of the distribution. The expected or mean value of a continuous random variable x is by definition

1ð ExðÞ¼ xfðÞ x dx: ðD:22Þ À1

Sometimes this value is called mathematical expectation. It has the following important properties: 282 D Mathematical Appendix

1. E(c) ¼ c, where c is constant. 2. E(cx) ¼ cE(x). 3. E(x + y) ¼ E(x)+E(y). 4. jjExðÞ ExðÞjj. Another very important numerical characteristic of a random variable is variance which in a continuous case is defined as

1ð VarðÞ¼x ðÞx À ExðÞ2fxðÞdx; ðD:23Þ À1 with the following properties: 1. Var(c) ¼ 0. 2. Var(cx) ¼ c2Var(x). 3. Var(x + y) ¼ Var(x) + Var( y)ifx and y are independent random variables. Two random variables are called independent if their joint probability density function can be factored into their individual probability densities as in

; : : f x,yðÞ¼X Y f xðÞX f yðÞY ðD 24Þ

By definition, probability theory is concerned with the statements that can be made about a random variable if its probability density function or probability distribu- tion function is known. Theory of estimation or statistical theory is concerned with the statements that can be made about a probability density function with only a limited amount of samples drawn from that probability density function. If the general form of underlying distribution is known, and the problem is to evaluate parameters from the available data, then the problem of estimation is called parametric. Otherwise, the estimation problem is called nonparametric. Having a set of experimental observations x1, x2,..., xn drawn from an underlying probability density function f (x|θ), the goal of statistical inference is to obtain a reliable estimate θ^ of the parameter θ. Any function of observations f(x1, x2,..., xn) is called statistics or estimator. An estimator θ^ of the parameter θ is called unbiased if ÂÃ E θ^ ¼ θ; ðD:25Þ that is, the mathematical expectation of θ^ equals the true parameter θ. An estimator θ^ of the parameter θ is called consistent if ÀÁ

lim P θ^ À θ > ε ¼ 0: ðD:26Þ n!1 D Mathematical Appendix 283

In other words an estimator is converging in probability to the true parameter as the number of samples grows. An estimator θ^ of the parameter θ is called efficient if it has the smallest variance within a class of estimators ÀÁ  Var θ^  Var θe : ðD:27Þ

Let x1, x2,..., xn be a sample of independent observations drawn from an unknown parametric family of probability density functions f(x|θ). Then by definition the likelihood function is ÀÁ ÀÁ ÀÁ ÀÁ

Lx1, x2, ..., xn θ ¼ fx1 θ fx2 θ ...fxn θ : ðD:28Þ

The likelihood function represents an “inverse” probability since it is a function of the parameter. The most important principle of statistical inference is the maximum likelihood principle which can be stated as follows: Find an estimate for θ such that it maximizes the likelihood of observing the data that actually had been observed. Or, in other words, find the parameter values that make the observed data most likely.

D.3.1 Maximum Likelihood and Least Squares

Suppose we have a set of measurements (y1,x1),(y2,x2),...,(yn, xn), and we believe that the independent variable x and dependent variable y are linked through the following linear relationships:

y ¼ xb þ ε; ðD:29Þ with ε  NðÞ0; σ being normally distributed noise. The functional relationship can be written as

ε ¼ y À xb: ðD:30Þ

The functional form of the relation is known; however, the parameter b needs to be estimated from the given data sample. The likelihood function for the observed error terms is

2 2 ÀÁ Yn Àε Yn ÀðÞy Àx b 2 1 i 1 i i L ε , ε , ..., ε b; σ ¼ pffiffiffiffiffie2σ2 ¼ pffiffiffiffiffie 2σ2 : ðD:31Þ 1 2 n σ π σ π i¼1 2 i¼1 2 284 D Mathematical Appendix

It is computationally more convenient to work with sums rather than with products, and, sinceÂÃ logarithmÀÁ is a monotonic transformation and densities are 2 nonnegative, the log L ε1, ε2, ..., εn b, σ is usually considered and maximized. Taking the logarithm of the likelihood function and maximizing it, we obtain

ÀÁÀÁÀÁ Xn 2 n 1 2 max ln L ε1, ε2, ..., εn b, σ ¼Ànlnσ À ln2π À ðÞy À xib : b 2 2σ2 i¼1 ðD:32Þ

Since the first two terms are not dependent on the parameter of maximization b, the last expression amounts to maximization of ! 1 Xn max À ðÞy À x b 2 ðD:33Þ 2σ2 i i¼1 or ! 1 Xn min ðÞy À x b 2 ; ðD:34Þ 2σ2 i i¼1 which is exactly the least squares cost function. Thus, the least squares solution to a linear estimation problem is equivalent to the maximum likelihood solution under the assumption of Gaussian uncorrelated noise distribution. The variance of maximum likelihood estimator 0 2 !31 ÀÁ 2 À1 ÀÁ ∂L θ^ var θ^  @E4À 5A ; ðD:35Þ ∂θ^ where matrix 2 !3 ÀÁ 2 ∂L θ^ J ¼ E4À 5 ðD:36Þ ∂θ^ is called the matrix. The variance of any unbiased estimator is lower bounded as ÀÁ var θ^  JÀ1; ðD:37Þ which is the celebrated Cramer-Rao inequality. D Mathematical Appendix 285

D.3.2 Bias-Variance

Though the maximum likelihood estimator has the smallest variance among unbi- ased estimators, it is not an estimator with the smallest variance in general. If we are willing to introduce bias into an estimator, then in general we can obtain an estimator with a much smaller variance then the maximum likelihood estimator. This is called the bias-variance decomposition of the mean squared error and can be derived as follows: ÂÃ ÂÃÀÁ ÀÁ ^ 2 ^ ^ ^ 2 ED θ À θ ¼ ED θ À ED θ þ ED θ À θ hiÀÁÀÁ ÀÁÀÁ ÀÁÀÁÀÁÀÁ ^ ^ 2 ^ 2 ^ ^ ^ ¼ ED θ À ED θ þ ED θ À θ þ2 θ À ED θ ED θ À θ hiÀÁÀÁ hiÀÁÀÁ ^ ^ 2 ^ 2 ¼ ED θ À ED θ þ ED ED θ À θ ÂÃÀÁÀÁÀÁÀÁ ^ ^ ^ þ 2ED θ À ED θ ED θ À θ hiÀÁÀÁ ÂÃÀÁÀÁ ^ ^ 2 ^ 2 ¼ ED θ À ED θ þ ED θ À θ

¼ Var þ Bias2; ðD:38Þ where D denotes that averaging is performed over the available data set. To arrive at the final formula, we consider the following: ÂÃÀÁÀ ÁÁÀ ÀÁ ^ ^ ^ 2ED θ À ED θ ED θ À θ ¼ 0; ðD:39Þ due to the fact that ÂÃÀ ÀÁ ÀÁ ÀÁÀÁ ÀÁ ÀÁ ^ ^ ^ ^ ^ ^ ED θ À ED θ ¼ ED θ À ED ED θ ¼ ED θ À ED θ ¼ 0: ðD:40Þ ÀÁÀÁ ÀÁÀÁ ^ ^ The last transformation is made because ED ED θ ¼ ED θ , since the math- ematical expectation of a mathematical expectation is just a mathematical expec- tation. For the same reason hiÀÁÀÁ ÂÃÀÁ ^ 2 ^ 2 ED ED θ À θ ¼ ED θ À θ : ðD:41Þ

The bias-variance decomposition shows that the mean squared error between the estimate and the true parameter consists of two parts—variance of the estimate plus squared bias of the estimate. One term can be increased or decreased at the expense of the other term; hence, we can trade a little bit of bias for smaller variance. The idea of regularization exploits this fact as almost any regularized solution is biased but has a smaller variance than the maximum likelihood solution. 286 D Mathematical Appendix

D.3.3 Bayes Theorem

Maximum likelihood historically is not the oldest method to “invert” probabilities. In this respect the Bayesian approach came first. The Bayesian approach makes use of Bayes theorem to recover unknown parameters or models from the data.

PðÞData=Model PðÞModel PðÞ¼Model=Data : ðD:42Þ PðÞData

Sometimes the Bayes formula is written in the other form:

Likelihood Á prior Posterior ¼ : ðD:43Þ Evidence

Notice that in addition to the likelihood, the Bayesian approach requires a priori probability of the model to be specified which is the most vexing and controversial issue of Bayesian inference.

D.3.4 Bayesian Interpretation of Regularization

Let us reformulate our maximum likelihood example using Bayesian interpretation. Suppose that prior distribution of the parameter b is Gaussian with zero mean and unknown standard deviation σb as in

2 À b 1 2σ2 pbðÞ¼pffiffiffiffiffi e b : ðD:44Þ 2πσb

Then, combining likelihood and prior distribution using Bayes theorem, we obtain "# n 2 Y ÀðÞyiÀxib b2 1 1 À σ 2 PbðÞ/jε , ε , ..., ε pffiffiffiffiffi e 2σε2 Á pffiffiffiffiffi e 2 b : ðD:45Þ 1 2 n σ π πσ i¼1 ε 2 2 b

Taking the logarithm of both sides and performing arithmetical manipulations, we obtain

ÀÁÀÁ Xn 1 2 1 2 ln Pbε , ε , ..., ε /À ðÞy À x b À b : ðD:46Þ 1 2 n 2σ2 i σ2 ε i¼1 2 b

Maximization of the last expression amounts to minimization of

Xn σ2 Xn EbðÞ¼ ðÞy À x b 2 þ ε b2 ¼ ðÞy À x b 2 þ λ2b2 ðD:47Þ i σ2 i i¼1 b i¼1 D Mathematical Appendix 287 with

2 σε λ2 ¼ ; ðD:48Þ σ2 b which is exactly the expression for zero-order Tikhonov regularization. Thus Tikhonov regularization can be interpreted as Bayesian inference with Gaussian likelihood and Gaussian prior on the parameters. In statistics, this solution is known as the maximum penalized likelihood or MPL solution.

D.4 Multivariable Analysis

Let function F(x1, x2,..., xn) be a scalar real-valued function of a real vector x. The gradient of x is defined as a column vector:  ∂F ∂F ∂F T ∇FxðÞ¼ ; ; ...; : ðD:49Þ ∂x1 ∂x2 ∂xn

For the same function F(x1, x2,..., xn), the Hessian is defined as a symmetric (n  n) matrix of second derivatives: 2 3 ∂2F ∂2F ∂2F 6 ÁÁÁ 7 6 ∂ 2 ∂ ∂ ∂ ∂ 7 6 x1 x1 x2 x1 xn 7 6 7 6 2 2 2 7 6 ∂ F ∂ F ∂ F 7 2 6 ÁÁÁ 7 ∇ FxðÞ¼6 ∂x ∂x ∂x2 ∂x ∂x 7: ðD:50Þ 6 2 1 2 2 n 7 6 7 6 ⋮⋮ÁÁÁ ⋮ 7 6 7 4 ∂2F ∂2F ∂2F 5 ÁÁÁ ∂ ∂ ∂ ∂ ∂ 2 xn x1 xn x2 xn

The is widely used in linear as well as nonlinear optimization. The Hessian also appears in multivariable Taylor series expansion of F(x) around the point x0 as

T FxðÞ¼0 þ Δx0 FxðÞþ0 Δx ∇FxðÞ0 1 þ ΔxT∇2FxðÞΔx þ higher order terms: ðD:51Þ 2 0 288 D Mathematical Appendix

D.5 Convolution and Fourier Transform

The following integral is called the convolution of two functions:

1ð gxðÞ¼ KxðÞÀ y fyðÞdy; ðD:52Þ À1 where KxðÞÀ y is known under different names in different fields such as the kernel function in the theory of integral equations, the impulse response function in engineering, the point spread function in imaging, Green’s function in physics, and the fundamental solution in mathematics. The Fourier transform of function f(x) is a function depending on frequency w:

1ð ^fwðÞ¼ fxðÞeÀiwxdx: ðD:53Þ À1

The fundamental relations linking convolution of two functions and their Fourier transforms is as follows:

1ð gxðÞ¼ KxðÞÀ y fyðÞdy: ðD:54Þ À1 gx^ ðÞ¼Kw^ ðÞ^fwðÞ: ðD:55Þ

D.6 The Trace Result

pffiffiffi Consider a random m-vector b normally distributed as nb  NðÞ0; Σ . The expected value of (bTAb) is given by

ÀÁ1 EbTAb ¼ traceðÞAΣ : ðD:56Þ n

Indeed, the expected value can be calculated using the properties of the expectation operator as D Mathematical Appendix 289 0 0 10 11 a ÁÁÁ a b ÀÁ 11 1m 1 T B B CB CC EbAb ¼ Eb@ðÞ1 ÁÁÁ bm @ ⋮⋮A@ ⋮ AA a ÁÁÁ a b m1 mm m ! Xm Xm Xm ¼ E b1biai1 þ b2biai2 þÁÁÁþ bmbiaim i¼1 i¼1 i¼1 Xm Xm 1Xm Xm 1 ¼ a EbðÞ¼b a σ ¼ traceðÞAΣ : ðD:57Þ ik k i n ik ki n k¼1 i¼1 k¼1 i¼1

This proves the result. Index

A Differential images, 159, 161, 164–167, Area decreasing flow, 200, 201, 203, 208, 219 169, 174, 176 Digital image processing, v Direct search, 70, 73, 75, 76 B Discrepancy principle, 29–33 BDCT. See Block discrete cosine transform Discrete-discrete model, 143–144 (BDCT) Block-based edge classification, 169 Block-circulant, 137 E Block discrete cosine transform (BDCT), Edge detection, 164 159, 160, 165, 168 Blocking artifacts, 159–162, 174, 175 Block-Toeplithz matrix, 137 F Fibonacci search method, 71 Frequency-domain implementation, 113, C 118, 131, 164 Cauchy-Schwarz inequality, 205, 209 Fusion of multi-sensory data, 222 Circulant matrix, 114–117 CLS. See Constrained least squares (CLS) Compressed video, 159–176 G Computer vision, v, vii, viii, 131, 179 Gaussian filters, 147, 148 Conjugate direction methods, 85 Generalized cross validation (GCV), 29 Conjugate-gradient methods, 131 Genetic algorithm, 224, 237, 239, 241, 242 Constrained least squares (CLS), 18, 116 Global minimum, 55 Constrained optimization, 93–109 Golden section, 63 Continuous-discrete model, 142–143 Gradient, 54, 56–59, 75–79, 83, 87–90, Convergence of iterative algorithms, 135–136 96, 98, 99, 103, 107–109 Convex set, 165 Gradient descent method, 75, 78 Cost function, v

H D Hessian matrix, 57, 95, 96, 99, 105 Data fusion, 139–153, 222 Highpass filter, 168 Derivative based methods, 131 High-resolution image interpolation, 140 Differential image domain, 170–173 Hilbert transforms, 147, 148

© Springer International Publishing Switzerland 2016 291 M.A. Abidi et al., Optimization Techniques in Computer Vision, Advances in Computer Vision and Pattern Recognition, DOI 10.1007/978-3-319-46364-3 292 Index

I Mean curvature flow, 200 Ill-conditioned, 5, 7, 10, 20 Mean squared difference of slope (MSDS), 160 Ill-posed problems, v, 3–23 Metropolis algorithm, 70 Image compression, 159, 162, 176 Motion compensated image, 161–167, 169, Image degradation model, 162–164 171, 173, 174 Image indexing, 164 Motion estimation, 13–14 Image interpolation (algorithm), 11–13, 141, Motion vectors, 161, 165, 176 144, 153 Multi-modal scene reconstruction, 222 Image processing, vii, 131, 137 Multiple nonlinear equality constraints, 105 Image restoration, 10–11 Multi-sensor registration, 224–234 Image smoothing, 199–219 Mutual information, 232, 233 Information complexity regularization parameter selection (ICOMPRPS) method, 46 N Information metric, 231–234, 246 Newton’s method, 63–65, 76, 81–85 Iterative image restoration, 160, 161, 167, 174 Iterative methods, 131–137 O Object recognition, 179, 195 J Object representation, 179, 180, 190, 194, 195 Jacobian matrix, 105, 106, 109 Optimization, 53–66 Optimization problem, 131, 132

K Kronecker delta function, 114 P Kronecker product, 114, 116 Parameter, 118 Kullback-Leibler divergence, 231 Pattern search method, 72 Periodogram, 125–128 Pixel, 140, 145, 149, 150, 153 L Pose estimation, 227–229 LADAR. See Light amplitude detection Power spectrum, 122, 124–127 and ranging (LADAR) Preconditioning, 136–137 Lagrangian function, 96, 108 Projection operator, 169–170 Landmark-based registration, 225 Projective geometry, 225 Laplacian smoothing, 200 Laser range finders, 199 L-curve, 29, 33 Q Levenberg-Marquardt algorithm, 188–189 Quadratic function, 133 Light amplitude detection and ranging Quasi-Newton methods, 131 (LADAR), 224, 225, 231, 234–242, 246 Linear convolution, 114 Linear equality constraint, 94, 95 R Linear inequality constraints, 100, 101, 103 Random search method, 70, 71 Line search, 66 Range data smoothing, 202–207 Local minimum, 55, 60, 61 Regularization, vi, viii, 18, 20–23, 113, Lowpass filter, 168, 169, 171, 174 200–202 Regularization parameter, 29–33, 36, 38, 39, 43, 46, 49, 120, 121 M Regularized 3D image smoothing, 200 Macroblock-based motion compensation, Regularized energy function, 201, 203, 159, 173 205–207 Mallows’ CL, 29 Regularized image interpolation, 139–153 Matrix, 3, 5, 7, 8, 10, 12, 13, 16, 18, 20, 22 Regularized image restoration, 146–147, 168 Index 293

Regularized optimization, vi, 161, 168 Surface curvature, 180, 194 Restoration algorithm, 161, 162, 167, 175, 176 Surface mesh smoothing, 207–214 Surface smoothing, 200–202, 219

S Saddle point, 58 T Secant method, 65 Tapering deformation, 182 Sensor data, 144, 145 voting, 201, 208, 210 Shape reconstruction, 179 Three-dimensional, 179–196 Similarity measures, 225, 229, 230, 237, 246 3D model building, 222 Simplex method, 73–75 Singular value decomposition, 211 Spatially adaptive fusion, 149–150 U Spatially adaptive image restoration, 168–169 Unconstrained optimization, v, 69–90 Steepest descent, 75, 78–81, 88 Steerable filter, 147–148 Subsampling process, 141–144 V Superquadric deformation, 182 Volumetric decomposition, 191–192 Superquadric inside-outside function, 181, Voxels, 222, 224 184, 186 Superquadric model, 180, 182, 183, 186, 187, 190, 192, 194, 195 W Superquadrics, 179–181, 184–186 Well-posed problem, 5, 8, 19 Superquadric surface, 180, 181, 185, 189 Wiener filter, 116, 122–124, 126–128