TOPIC 0 Introduction
Total Page:16
File Type:pdf, Size:1020Kb
TOPIC 0 Introduction 1 Review Course: Math 535 (http://www.math.wisc.edu/~roch/mmids/) - Mathematical Methods in Data Science (MMiDS) Author: Sebastien Roch (http://www.math.wisc.edu/~roch/), Department of Mathematics, University of Wisconsin-Madison Updated: Sep 21, 2020 Copyright: © 2020 Sebastien Roch We first review a few basic concepts. 1.1 Vectors and norms For a vector ⎡ 푥 ⎤ ⎢ 1 ⎥ 푥 ⎢ 2 ⎥ 푑 퐱 = ⎢ ⎥ ∈ ℝ ⎢ ⋮ ⎥ ⎣ 푥푑 ⎦ the Euclidean norm of 퐱 is defined as ‾‾푑‾‾‾‾ ‖퐱‖ = 푥2 = √퐱‾‾푇‾퐱‾ = ‾⟨퐱‾‾, ‾퐱‾⟩ 2 ∑ 푖 √ ⎷푖=1 푇 where 퐱 denotes the transpose of 퐱 (seen as a single-column matrix) and 푑 ⟨퐮, 퐯⟩ = 푢 푣 ∑ 푖 푖 푖=1 is the inner product (https://en.wikipedia.org/wiki/Inner_product_space) of 퐮 and 퐯. This is also known as the 2- norm. More generally, for 푝 ≥ 1, the 푝-norm (https://en.wikipedia.org/wiki/Lp_space#The_p- norm_in_countably_infinite_dimensions_and_ℓ_p_spaces) of 퐱 is given by 푑 1/푝 ‖퐱‖ = |푥 |푝 . 푝 (∑ 푖 ) 푖=1 Here (https://commons.wikimedia.org/wiki/File:Lp_space_animation.gif#/media/File:Lp_space_animation.gif) is a nice visualization of the unit ball, that is, the set {퐱 : ‖푥‖푝 ≤ 1}, under varying 푝. There exist many more norms. Formally: 푑 푑 Definition (Norm): A norm is a function ℓ from ℝ to ℝ+ that satisfies for all 푎 ∈ ℝ, 퐮, 퐯 ∈ ℝ (Homogeneity): ℓ(푎퐮) = |푎|ℓ(퐮) (Triangle inequality): ℓ(퐮 + 퐯) ≤ ℓ(퐮) + ℓ(퐯) (Point-separating): ℓ(푢) = 0 implies 퐮 = 0. ⊲ The triangle inequality for the 2-norm follows (https://en.wikipedia.org/wiki/Cauchy– Schwarz_inequality#Analysis) from the Cauchy–Schwarz inequality (https://en.wikipedia.org/wiki/Cauchy– Schwarz_inequality). 푑 Theorem (Cauchy–Schwarz): For all 퐮, 퐯 ∈ ℝ |⟨퐮, 퐯⟩| ≤ ‖퐮‖2‖퐯‖2. 푑 The Euclidean distance (https://en.wikipedia.org/wiki/Euclidean_distance) between two vectors 퐮 and 퐯 in ℝ is the 2-norm of their difference 푑(퐮, 퐯) = ‖퐮 − 퐯‖2. Throughout we use the notation ‖퐱‖ = ‖퐱‖2 to indicate the 2-norm of 퐱 unless specified otherwise. 푑 We will often work with collections of 푛 vectors 퐱1 , … , 퐱푛 in ℝ and it will be convenient to stack them up into a matrix ⎡ 퐱푇 ⎤ ⎡ 푥 푥 ⋯ 푥 ⎤ ⎢ 1 ⎥ 11 12 1푑 푇 ⎢ ⎥ ⎢ 퐱2 ⎥ ⎢ 푥21 푥22 ⋯ 푥2푑 ⎥ 푋 = ⎢ ⎥ = ⎢ ⎥ , ⎢ ⋮ ⎥ ⎢ ⋮ ⋮ ⋱ ⋮ ⎥ 푇 ⎣ ⎦ ⎣ 퐱푛 ⎦ 푥푛1 푥푛2 ⋯ 푥푛푑 where 푇 indicates the transpose (https://en.wikipedia.org/wiki/Transpose): (Source (https://commons.wikimedia.org/wiki/File:Matrix_transpose.gif)) NUMERICAL CORNER In Julia, a vector can be obtained in different ways. The following method gives a row vector as a two-dimensional array. In [1]: # Julia version: 1.5.1 using LinearAlgebra, Statistics, Plots In [2]: t = [1. 3. 5.] Out[2]: 1×3 Array{Float64,2}: 1.0 3.0 5.0 To turn it into a one-dimensional array, use vec (https://docs.julialang.org/en/v1/base/arrays/#Base.vec). In [3]: vec(t) Out[3]: 3-element Array{Float64,1}: 1.0 3.0 5.0 To construct a one-dimensional array directly, use commas to separate the entries. In [4]: u = [1., 3., 5., 7.] Out[4]: 4-element Array{Float64,1}: 1.0 3.0 5.0 7.0 To obtain the norm of a vector, we can use the function norm (https://docs.julialang.org/en/v1/stdlib/LinearAlgebra/#LinearAlgebra.norm) (which requires the LinearAlgebra package): In [5]: norm(u) Out[5]: 9.16515138991168 which we can check "by hand" In [6]: sqrt(sum(u.^2)) Out[6]: 9.16515138991168 The . above is called broadcasting (https://docs.julialang.org/en/v1.2/manual/mathematical-operations/#man- dot-operators-1). It applies the operator following it (in this case taking a square) element-wise. Exercise: Compute the inner product of 푢 = (1, 2, 3, 4) and 푣 = (5, 4, 3, 2) without using the function dot (https://docs.julialang.org/en/v1/stdlib/LinearAlgebra/#LinearAlgebra.dot). Hint: The product of two real numbers 푎 and 푏 is a * b . In [7]: # Try it! u = [1., 2., 3., 4.]; # EDIT THIS LINE: define v # EDIT THIS LINE: compute the inner product between u and v ⊲ To create a matrix out of two vectors, we use the function hcat (https://docs.julialang.org/en/v1.2/base/arrays/#Base.hcat) and transpose. In [8]: u = [1., 3., 5., 7.]; v = [2., 4., 6., 8.]; X = hcat(u,v)' Out[8]: 2×4 Adjoint{Float64,Array{Float64,2}}: 1.0 3.0 5.0 7.0 2.0 4.0 6.0 8.0 With more than two vectors, we can use the reduce (https://docs.julialang.org/en/v1/base/collections/#Base.reduce-Tuple{Any,Any}) function. In [9]: u = [1., 3., 5., 7.]; v = [2., 4., 6., 8.]; w = [9., 8., 7., 6.]; X = reduce(hcat, [u, v, w])' Out[9]: 3×4 Adjoint{Float64,Array{Float64,2}}: 1.0 3.0 5.0 7.0 2.0 4.0 6.0 8.0 9.0 8.0 7.0 6.0 1.2 Multivariable calculus 1.2.1 Limits and continuity ‾‾‾푑‾‾‾‾‾2 푇 푑 Throughout this section, we use the Euclidean norm ‖퐱‖ = √∑푖=1 푥푖 for 퐱 = (푥1, … , 푥푑 ) ∈ ℝ . 푑 The open 푟-ball around 퐱 ∈ ℝ is the set of points within Euclidean distance 푟 of 퐱, that is, 푑 퐵푟(퐱) = {퐲 ∈ ℝ : ‖퐲 − 퐱‖ < 푟}. 푑 푑 A point 퐱 ∈ ℝ is a limit point (or accumulation point) of a set 퐴 ⊆ ℝ if every open ball around 퐱 contains an element 퐚 of 퐴 such that 퐚 ≠ 퐱. A set 퐴 is closed if every limit point of 퐴 belongs to 퐴. (Source (https://www.math212.com/2017/12/limit-point-closure.html)) 푑 푑 A point 퐱 ∈ ℝ is an interior point of a set 퐴 ⊆ ℝ if there exists an 푟 > 0 such that 퐵푟(퐱) ⊆ 퐴. A set 퐴 is open if it consists entirely of interior points. (Source (https://commons.wikimedia.org/wiki/File:Open_set_-_example.png)) 푑 푇 A set 퐴 ⊆ ℝ is bounded if there exists an 푟 > 0 such that 퐴 ⊆ 퐵푟(0), where 0 = (0, … , 0) . 푑 Definition (Limits of a Function): Let 푓 : 퐷 → ℝ be a real-valued function on 퐷 ⊆ ℝ . Then 푓 is said to have a limit 퐿 ∈ ℝ as 퐱 approaches 퐚 if: for any 휖 > 0, there exists a 훿 > 0 such that |푓(퐱) − 퐿| < 휖 for all 퐱 ∈ 퐷 ∩ 퐵훿(퐚) ∖ {퐚}. This is written as lim 푓(퐱) = 퐿. 퐱→퐚 ⊲ Note that we explicitly exclude 퐚 itself from having to satisfy the condition |푓(퐱) − 퐿| < 휖. In particular, we may have 푓(퐚) ≠ 퐿. We also do not restrict 퐚 to be in 퐷. 푑 Definition (Continuous Function): Let 푓 : 퐷 → ℝ be a real-valued function on 퐷 ⊆ ℝ . Then 푓 is said to be continuous at 퐚 ∈ 퐷 if lim 푓(퐱) = 푓(퐚). 퐱→퐚 ⊲ (Source (https://commons.wikimedia.org/wiki/File:Example_of_continuous_function.svg)) We will not prove the following fundamental analysis result, which will be useful below. See e.g. Wikipedia 푑 (https://en.wikipedia.org/wiki/Extreme_value_theorem). Suppose 푓 : 퐷 → ℝ is defined on a set 퐷 ⊆ ℝ . We ∗ ∗ say that 푓 attains a maximum value 푀 at 퐳 if 푓(퐳 ) = 푀 and 푀 ≥ 푓(퐱) for all 퐱 ∈ 퐷. Similarly, we say 푓 attains a minimum value 푚 at 퐳∗ if 푓(퐳∗ ) = 푚 and 푚 ≥ 푓(퐱) for all 퐱 ∈ 퐷. Theorem (Extreme Value): Let 푓 : 퐷 → ℝ be a real-valued, continuous function on a nonempty, closed, 푑 bounded set 퐷 ⊆ ℝ . Then 푓 attains a maximum and a minimum on 퐷. 1.2.2 Derivatives We begin by reviewing the single-variable case. Recall that the derivative of a function of a real variable is the rate of change of the function with respect to the change in the variable. Formally: Definition (Derivative): Let 푓 : 퐷 → ℝ where 퐷 ⊆ ℝ and let 푥0 ∈ 퐷 be an interior point of 퐷. The derivative of 푓 at 푥0 is ′ d푓(푥0) 푓(푥0 + ℎ) − 푓(푥0) 푓 (푥0) = = lim d푥 ℎ→0 ℎ provided the limit exists. ⊲ (Source (https://commons.wikimedia.org/wiki/File:Tangent_to_a_curve.svg)) Exercise: Let 푓 and 푔 have derivatives at 푥 and let 훼 and 훽 be constants. Show that [훼푓(푥) + 훽푔(푥)]′ = 훼푓 ′ (푥) + 훽푔′ (푥). ⊲ The following lemma encapsulates a key insight about the derivative of 푓 at 푥0 : it tells us where to find smaller values. Lemma (Descent Direction): Let 푓 : 퐷 → ℝ with 퐷 ⊆ ℝ and let 푥0 ∈ 퐷 be an interior point of 퐷 where ′ ′ 푓 (푥0) exists. If 푓 (푥0) > 0, then there is an open ball 퐵훿(푥0) ⊆ 퐷 around 푥0 such that for each 푥 in 퐵훿(푥0): (a) 푓(푥) > 푓(푥0) if 푥 > 푥0 , (b) 푓(푥) < 푓(푥0) if 푥 < 푥0 . ′ If instead 푓 (푥0) < 0, the opposite holds. ′ Proof idea: Follows from the definition of the derivative by taking 휖 small enough that 푓 (푥0) − 휖 > 0. ′ Proof: Take 휖 = 푓 (푥0)/2. By definition of the derivative, there is 훿 > 0 such that 푓(푥 + ℎ) − 푓(푥 ) 푓 ′ (푥 ) − 0 0 < 휖 0 ℎ for all 0 < ℎ < 훿. Rearranging gives ′ 푓(푥0 + ℎ) > 푓(푥0) + [푓 (푥0) − 휖]ℎ > 푓(푥0) by our choice of 휖. The other direction is similar. ◻ 푑 For functions of several variables, we have the following generalization. As before, we let 퐞푖 ∈ ℝ be the 푖-th standard basis vector. 푑 Definition (Partial Derivative): Let 푓 : 퐷 → ℝ where 퐷 ⊆ ℝ and let 퐱0 ∈ 퐷 be an interior point of 퐷.