L. Vandenberghe ECE133A (Spring 2021) 2. Norm, distance, angle
norm • distance • :-means algorithm • angle • complex vectors •
2.1 Euclidean norm
= (Euclidean) norm of vector 0 R : ∈ q 0 = 02 02 02= k k 1 + 2 + · · · + p = 0) 0
if = = 1, 0 reduces to absolute value 0 • k k | | measures the magnitude of 0 • sometimes written as 0 to distinguish from other norms, e.g., • k k2
0 = 0 0 0= k k1 | 1| + | 2| + · · · + | |
Norm, distance, angle 2.2 Properties
Positive definiteness
0 0 for all 0, 0 = 0 only if 0 = 0 k k ≥ k k
Homogeneity
V0 = V 0 for all vectors 0 and scalars V k k | |k k
Triangle inequality (proved on page 2.7)
0 1 0 1 for all vectors 0 and 1 of equal length k + k ≤ k k + k k
Norm of block vector: if 0, 1 are vectors, 0 q = 0 2 1 2 1 k k + k k
Norm, distance, angle 2.3 Cauchy–Schwarz inequality
) = 0 1 0 1 for all 0, 1 R | | ≤ k kk k ∈
) moreover, equality 0 1 = 0 1 holds if: | | k kk k ) 0 = 0 or 1 = 0; in this case 0 1 = 0 = 0 1 • k kk k 0 ≠ 0 and 1 ≠ 0, and 1 = W0 for some W > 0; in this case • ) 0 < 0 1 = W 0 2 = 0 1 k k k kk k
0 ≠ 0 and 1 ≠ 0, and 1 = W0 for some W > 0; in this case • − ) 0 > 0 1 = W 0 2 = 0 1 − k k −k kk k
Norm, distance, angle 2.4 Proof of Cauchy–Schwarz inequality
1. trivial if 0 = 0 or 1 = 0 ) 2. assume 0 = 1 = 1; we show that 1 0 1 1 k k k k − ≤ ≤ 0 0 1 2 0 0 1 2 ≤ k − k ≤ k + k = 0 1 ) 0 1 = 0 1 ) 0 1 ( − ) ( ) − ) ( + ) ( ) + ) = 0 2 20 1 1 2 = 0 2 20 1 1 2 k k − ) + k k k k + ) + k k = 2 1 0 1 = 2 1 0 1 ( − ) ( + )
with equality only if 0 = 1 with equality only if 0 = 1 −
3. for general nonzero 0, 1, apply case 2 to the unit-norm vectors
1 0, 1 1 0 1 k k k k
Norm, distance, angle 2.5 Average and RMS value let 0 be a real =-vector
the average of the elements of 0 is • ) 01 02 0= 1 0 avg 0 = + + · · · + = ( ) = =
the root-mean-square value is the root of the average squared entry • s 02 02 02= 0 rms 0 = 1 + 2 + · · · + = k k ( ) = √=
Exercise: show that avg 0 rms 0 | ( )| ≤ ( )
Norm, distance, angle 2.6 Triangle inequality from Cauchy–Schwarz inequality for vectors 0, 1 of equal size
0 1 2 = 0 1 ) 0 1 k + k ( + ) ( + ) = 0) 0 1) 0 0) 1 1) 1 + )+ + = 0 2 20 1 1 2 k k + + k k 0 2 2 0 1 1 2 (by Cauchy–Schwarz) ≤ k k + k kk k + k k = 0 1 2 (k k + k k)
taking squareroots gives the triangle inequality • ) triangle inequality is an equality if and only if 0 1 = 0 1 (see page 2.4) • k kk k ) also note from line 3 that 0 1 2 = 0 2 1 2 if 0 1 = 0 • k + k k k + k k
Norm, distance, angle 2.7 Outline
norm • distance • :-means algorithm • angle • complex vectors • Distance the (Euclidean) distance between vectors 0 and 1 is defined as 0 1 k − k
0 1 0 for all 0, 1 and 0 1 = 0 only if 0 = 1 •k − k ≥ k − k triangle inequality •
0 2 0 1 1 2 for all 0, 1, 2 k − k ≤ k − k + k − k c
a c k − k b c k − k
a a b b k − k 0 1 RMS deviation between =-vectors 0 and 1 is rms 0 1 = k − k • ( − ) √=
Norm, distance, angle 2.8 Standard deviation let 0 be a real =-vector
the de-meaned vector is the vector of deviations from the average • 0 ) 0 = 01 avg 0 1 1 0 − (0) − ( ) )/ 2 avg 02 1 0 = 0 avg 0 1 = − . ( ) = − ( . )/ − ( ) . . ) 0= avg 0 0= 1 0 = − ( ) − ( )/
the standard deviation is the RMS deviation from the average • ) 0 1 0 = 1 std 0 = rms 0 avg 0 1 = − (( )/ ) ( ) ( − ( ) ) √=
the de-meaned vector in standard units is • 1 0 avg 0 1 std 0 ( − ( ) ) ( )
Norm, distance, angle 2.9 Mean return and risk of investment
vectors represent time series of returns on an investment (as a percentage) • average value is (mean) return of the investment • standard deviation measures variation around the mean, i.e., risk • ak bk ck dk 10 10 10 10 5 5 5 5 0 k 0 k 0 k 0 k 5 5 5 5 − − − − (mean) return
3 c b 2 d 1 a
0 risk 0 1 2 3 4 5
Norm, distance, angle 2.10 Exercise show that avg 0 2 std 0 2 = rms 0 2 ( ) + ( ) ( )
Solution
0 avg 0 1 2 std 0 2 = k − ( ) k ( ) = ) ) ) 1 1 0 1 0 = 0 1 0 1 = − = − = ) 2 ) 2 ) 2 ! 1 ) 1 0 1 0 1 0 = 0 0 ( ) ( ) = = − = − = + =
) 2 1 ) 1 0 = 0 0 ( ) = − = = rms 0 2 avg 0 2 ( ) − ( )
Norm, distance, angle 2.11 Exercise: nearest scalar multiple
= given two vectors 0, 1 R , with 0 ≠ 0, find scalar multiple C0 closest to 1 ∈ b
line ta t R taˆ { | ∈ }
Solution
squared distance between C0 and 1 is • ) ) ) ) C0 1 2 = C0 1 C0 1 = C20 0 2C0 1 1 1 k − k ( − ) ( − ) − + ) a quadratic function of C with positive leading coefficient 0 0 derivative with respect to C is zero for • 0) 1 0) 1 Cˆ = = 0) 0 0 2 k k Norm, distance, angle 2.12 Exercise: average of collection of vectors
= given # vectors G ,..., G# R , find the =-vector I that minimizes 1 ∈
I G 2 I G 2 I G# 2 k − 1k + k − 2k + · · · + k − k
x4
x3
x5 z
x2 x1
I is also known as the centroid of the points G1,..., G#
Norm, distance, angle 2.13 Solution: sum of squared distances is
2 2 2 I G1 I G2 I G# k − k= + k − k + · · · + k − k X 2 2 2 = I8 G1 8 I8 G2 8 I8 G# 8 8=1 ( − ( ) ) + ( − ( ) ) + · · · + ( − ( ) ) = X 2 2 2 = #I8 2I8 G1 8 G2 8 G# 8 G1 8 G# 8 8=1 − (( ) + ( ) + · · · + ( ) ) + ( ) + · · · + ( ) here G 9 8 is 8th element of the vector G 9 ( )
term 8 in the sum is minimized by • 1 I8 = G 8 G 8 G# 8 # (( 1) + ( 2) + · · · + ( ) )
solution I is component-wise average of the points G ,..., G#: • 1 1 I = G G G# # ( 1 + 2 + · · · + )
Norm, distance, angle 2.14 Outline
norm • distance • :-means algorithm • angle • complex vectors • :-means clustering
a popular iterative algorithm for partitioning # vectors G1,..., G# in : clusters
Norm, distance, angle 2.15 Algorithm
choose initial ‘representatives’ I1,..., I: for the : groups and repeat:
1. assign each vector G8 to the nearest group representative I 9
2. set the representative I 9 to the mean of the vectors assigned to it
initial representatives are often chosen randomly • as a variation, choose a random initial partition and start with step 2 • solution depends on choice of initial representatives or partition • can be shown to converge in a finite number of iterations • in practice, often restarted a few times, with different starting points •
Norm, distance, angle 2.16 Example: first iteration
assignment to groups updated representatives
Norm, distance, angle 2.17 Iteration 2
assignment to groups updated representatives
Norm, distance, angle 2.18 Iteration 3
assignment to groups updated representatives
Norm, distance, angle 2.19 Iteration 11
assignment to groups updated representatives
Norm, distance, angle 2.20 Iteration 12
assignment to groups updated representatives
Norm, distance, angle 2.21 Iteration 13
assignment to groups updated representatives
Norm, distance, angle 2.22 Iteration 14
assignment to groups updated representatives
Norm, distance, angle 2.23 Final clustering
Norm, distance, angle 2.24 Image clustering
MNIST dataset of handwritten digits • # = 60000 grayscale images of size 28 28 (vectors G8 of size 282 = 784) • × 25 examples: •
Norm, distance, angle 2.25 Group representatives (: = 20 )
:-means algorithm, with : = 20 and randomly chosen initial partition • 20 group representatives •
Norm, distance, angle 2.26 Group representatives (: = 20 ) result for another initial partition
Norm, distance, angle 2.27 Document topic discovery
# = 500 Wikipedia articles, from weekly most popular lists (9/2015–6/2016) • dictionary of 4423 words • each article represented by a word histogram vector of size 4423 • result of :-means algorithm with : = 9 and randomly chosen initial partition •
Cluster 1
largest coefficients in cluster representative I • 1
word fight win event champion fighter . . . coefficient 0.038 0.022 0.019 0.015 0.015 . . .
documents in cluster 1 closest to representative • “Floyd Mayweather, Jr”, “Kimbo Slice”, “Ronda Rousey”, “José Aldo”, “Joe Frazier”, . . .
Norm, distance, angle 2.28 Cluster 2
largest coefficients in cluster representative I • 2
word holiday celebrate festival celebration calendar . . . coefficient 0.012 0.009 0.007 0.006 0.006 . . .
documents in cluster 2 closest to representative • “Halloween”, “Guy Fawkes Night”, “Diwali”, “Hannukah”, “Groundhog Day”, . . .
Cluster 3
largest coefficients in cluster representative I • 3
word united family party president government . . . coefficient 0.004 0.003 0.003 0.003 0.003 . . .
documents in cluster 3 closest to representative • “Mahatma Gandhi”, “Sigmund Freund”, “Carly Fiorina”, “Frederick Douglass”, “Marco Rubio”, . . .
Norm, distance, angle 2.29 Cluster 4
largest coefficients in cluster representative I • 4
word album release song music single . . . coefficient 0.031 0.016 0.015 0.014 0.011 . . .
documents in cluster 4 closest to representative • “David Bowie”, “Kanye West”, “Celine Dion”, “Kesha”, “Ariana Grande”, . . .
Cluster 5
largest coefficients in cluster representative I • 5
word game season team win player . . . coefficient 0.023 0.020 0.018 0.017 0.014 . . .
documents in cluster 5 closest to representative • “Kobe Bryant”, “Lamar Odom”, “Johan Cruyff”, “Yogi Berra”, “José Mourinho”, . . .
Norm, distance, angle 2.30 Cluster 6
largest coefficients in representative I • 6
word series season episode character film . . . coefficient 0.029 0.027 0.013 0.011 0.008 . . .
documents in cluster 6 closest to cluster representative • “The X-Files”, “Game of Thrones”, “House of Cards”, “Daredevil”, “Supergirl”, . . .
Cluster 7
largest coefficients in representative I • 7
word match win championship team event . . . coefficient 0.065 0.018 0.016 0.015 0.015 . . .
documents in cluster 7 closest to cluster representative • “Wrestlemania 32”, “Payback (2016)”, “Survivor Series (2015)”, “Royal Rumble (2016)”, “Night of Champions (2015)”, . . .
Norm, distance, angle 2.31 Cluster 8
largest coefficients in representative I • 8
word film star role play series . . . coefficient 0.036 0.014 0.014 0.010 0.009 . . .
documents in cluster 8 closest to cluster representative • “Ben Affleck”, “Johnny Depp”, “Maureen O’Hara”, “Kate Beckinsale”, “Leonardo DiCaprio”, . . .
Cluster 9
largest coefficients in representative I • 9
word film million release star character . . . coefficient 0.061 0.019 0.013 0.010 0.006 . . .
documents in cluster 9 closest to cluster representative • “Star Wars: The Force Awakens”, “Star Wars Episode I: The Phantom Menace”, “The Martian (film)”, “The Revenant (2015 film)”, “The Hateful Eight”, . . .
Norm, distance, angle 2.32 Outline
norm • distance • :-means algorithm • angle • complex vectors • Angle between vectors the angle between nonzero real vectors 0, 1 is defined as
0) 1 arccos 0 1 k k k k ) this is the unique value of \ 0, c that satisfies 0 1 = 0 1 cos \ • ∈ [ ] k kk k b
θ a
Cauchy–Schwarz inequality guarantees that • 0) 1 1 1 − ≤ 0 1 ≤ k k k k
Norm, distance, angle 2.33 Terminology
) \ = 0 0 1 = 0 1 vectors are aligned or parallel k kk k ) 0 \ < c 2 0 1 > 0 vectors make an acute angle ≤ / ) \ = c 2 0 1 = 0 vectors are orthogonal (0 1) / ⊥ ) c 2 < \ c 0 1 < 0 vectors make an obtuse angle / ≤ ) \ = c 0 1 = 0 1 vectors are anti-aligned or opposed −k kk k
Norm, distance, angle 2.34 Correlation coefficient the correlation coefficient between non-constant vectors 0, 1 is
) 0˜ 1˜ d01 = 0˜ 1˜ k k k k where 0˜ = 0 avg 0 1 and 1˜ = 1 avg 1 1 are the de-meaned vectors − ( ) − ( )
only defined when 0 and 1 are not constant (0˜ ≠ 0 and 1˜ ≠ 0) • d01 is the cosine of the angle between the de-meaned vectors • a number between 1 and 1 • − d01 is the average product of the deviations from the mean in standard units • = 1 X 08 avg 0 18 avg 1 d01 = ( − ( )) ( − ( )) = std 0 std 1 8=1 ( ) ( )
Norm, distance, angle 2.35 Examples
ak bk bk
ρab = 0.968
k k ak
ak bk bk
ρ = 0.988 ab −
k k ak
ak bk bk
ρab = 0.004
k k ak
Norm, distance, angle 2.36 Regression line
scatter plot shows two =-vectors 0, 1 as = points 0:, 1: • ( ) straight line shows affine function 5 G = 2 2 G with • ( ) 1 + 2
5 0: 1:,: = 1, . . . , = ( ) ≈
f x ( )
x
Norm, distance, angle 2.37 Least squares regression = 1 2 2 = X 5 0 1 2 use coefficients 1, 2 that minimize = : : :=1 ( ( ) − )
is a quadratic function of 2 and 2 : • 1 2 = 1 = X 2 2 0 1 2 = 1 2 : : :=1 ( + − ) ) = =22 2= avg 0 2 2 0 222 2= avg 1 2 20 12 1 2 = 1 + ( ) 1 2 + k k 2 − ( ) 1 − 2 + k k /
to minimize , set derivatives with respect to 2 , 2 to zero: • 1 2 ) 2 avg 0 2 = avg 1 , = avg 0 2 0 22 = 0 1 1 + ( ) 2 ( ) ( ) 1 + k k 2
solution is • ) 0 1 = avg 0 avg 1 22 = − ( ) ( ), 21 = avg 1 avg 0 22 0 2 = avg 0 2 ( ) − ( ) k k − ( )
Norm, distance, angle 2.38 Interpretation
slope 22 can be written in terms of correlation coefficient of 0 and 1:
) 0 avg 0 1 1 avg 1 1 std 1 22 = ( − ( ) ) ( − ( ) ) = d01 ( ) 0 avg 0 1 2 std 0 k − ( ) k ( )
hence, expression for regression line can be written as • d01 std 1 5 G = avg 1 ( ) G avg 0 ( ) ( ) + std 0 ( − ( )) ( )
correlation coefficient d01 is the slope after converting to standard units: • 5 G avg 1 G avg 0 ( ) − ( ) = d01 − ( ) std 1 std 0 ( ) ( )
Norm, distance, angle 2.39 Examples
d01 = 0.91 d01 = 0.89 d01 = 0.25 − dashed lines in top row show average standard deviation • ± bottom row shows scatter plots of top row in standard units •
Norm, distance, angle 2.40 Outline
norm • distance • :-means algorithm • angle • complex vectors • Norm
= norm of vector 0 C : ∈ q 0 = 0 2 0 2 0= 2 k k | 1| + | 2| + · · · + | | p = 00
positive definite: • 0 0 for all 0, 0 = 0 only if 0 = 0 k k ≥ k k
homogeneous: • V0 = V 0 for all vectors 0, complex scalars V k k | |k k
triangle inequality: • 0 1 0 1 for all vectors 0, 1 of equal size k + k ≤ k k + k k
Norm, distance, angle 2.41 Cauchy–Schwarz inequality for complex vectors
= 0 1 0 1 for all 0, 1 C | | ≤ k kk k ∈
moreover, equality 0 1 = 0 1 holds if: | | k kk k 0 = 0 or 1 = 0 • 0 ≠ 0 and 1 ≠ 0, and 1 = W0 for some (complex) scalar W •
exercise: generalize proof for real vectors on page 2.4 • we say 0 and 1 are orthogonal if 0 1 = 0 • = we will not need definition of angle, correlation coefficient, . . . in C •
Norm, distance, angle 2.42