L. Vandenberghe ECE133A (Spring 2021) 2. Norm, distance, angle

norm • distance • :-means algorithm • angle • complex vectors •

2.1 Euclidean norm

= (Euclidean) norm of vector 0 R : ∈ q 0 = 02 02 02= k k 1 + 2 + · · · + p = 0) 0

if = = 1, 0 reduces to absolute value 0 • k k | | measures the magnitude of 0 • sometimes written as 0 to distinguish from other norms, e.g., • k k2

0 = 0 0 0= k k1 | 1| + | 2| + · · · + | |

Norm, distance, angle 2.2 Properties

Positive definiteness

0 0 for all 0, 0 = 0 only if 0 = 0 k k ≥ k k

Homogeneity

V0 = V 0 for all vectors 0 and scalars V k k | |k k

Triangle inequality (proved on page 2.7)

0 1 0 1 for all vectors 0 and 1 of equal length k + k ≤ k k + k k

Norm of block vector: if 0, 1 are vectors,   0 q = 0 2 1 2 1 k k + k k

Norm, distance, angle 2.3 Cauchy–Schwarz inequality

) = 0 1 0 1 for all 0, 1 R | | ≤ k kk k ∈

) moreover, equality 0 1 = 0 1 holds if: | | k kk k ) 0 = 0 or 1 = 0; in this case 0 1 = 0 = 0 1 • k kk k 0 ≠ 0 and 1 ≠ 0, and 1 = W0 for some W > 0; in this case • ) 0 < 0 1 = W 0 2 = 0 1 k k k kk k

0 ≠ 0 and 1 ≠ 0, and 1 = W0 for some W > 0; in this case • − ) 0 > 0 1 = W 0 2 = 0 1 − k k −k kk k

Norm, distance, angle 2.4 Proof of Cauchy–Schwarz inequality

1. trivial if 0 = 0 or 1 = 0 ) 2. assume 0 = 1 = 1; we show that 1 0 1 1 k k k k − ≤ ≤ 0 0 1 2 0 0 1 2 ≤ k − k ≤ k + k = 0 1 ) 0 1 = 0 1 ) 0 1 ( − ) ( ) − ) ( + ) ( ) + ) = 0 2 20 1 1 2 = 0 2 20 1 1 2 k k − ) + k k k k + ) + k k = 2 1 0 1 = 2 1 0 1 ( − ) ( + )

with equality only if 0 = 1 with equality only if 0 = 1 −

3. for general nonzero 0, 1, apply case 2 to the unit-norm vectors

1 0, 1 1 0 1 k k k k

Norm, distance, angle 2.5 Average and RMS value let 0 be a real =-vector

the average of the elements of 0 is • ) 01 02 0= 1 0 avg 0 = + + · · · + = ( ) = =

the root-mean-square value is the root of the average squared entry • s 02 02 02= 0 rms 0 = 1 + 2 + · · · + = k k ( ) = √=

Exercise: show that avg 0 rms 0 | ( )| ≤ ( )

Norm, distance, angle 2.6 Triangle inequality from Cauchy–Schwarz inequality for vectors 0, 1 of equal size

0 1 2 = 0 1 ) 0 1 k + k ( + ) ( + ) = 0) 0 1) 0 0) 1 1) 1 + )+ + = 0 2 20 1 1 2 k k + + k k 0 2 2 0 1 1 2 (by Cauchy–Schwarz) ≤ k k + k kk k + k k = 0 1 2 (k k + k k)

taking squareroots gives the triangle inequality • ) triangle inequality is an equality if and only if 0 1 = 0 1 (see page 2.4) • k kk k ) also note from line 3 that 0 1 2 = 0 2 1 2 if 0 1 = 0 • k + k k k + k k

Norm, distance, angle 2.7 Outline

norm • distance • :-means algorithm • angle • complex vectors • Distance the (Euclidean) distance between vectors 0 and 1 is defined as 0 1 k − k

0 1 0 for all 0, 1 and 0 1 = 0 only if 0 = 1 •k − k ≥ k − k triangle inequality •

0 2 0 1 1 2 for all 0, 1, 2 k − k ≤ k − k + k − k c

a c k − k b c k − k

a a b b k − k 0 1 RMS deviation between =-vectors 0 and 1 is rms 0 1 = k − k • ( − ) √=

Norm, distance, angle 2.8 Standard deviation let 0 be a real =-vector

the de-meaned vector is the vector of deviations from the average • 0 ) 0 =  01 avg 0   1 1   0 − (0)   − ( ) )/   2 avg   02 1 0 =  0 avg 0 1 =  − . ( )  =  − ( . )/  − ( )  .   .     )   0= avg 0   0= 1 0 =   − ( )   − ( )/ 

the standard deviation is the RMS deviation from the average • ) 0 1 0 = 1 std 0 = rms 0 avg 0 1 = − (( )/ ) ( ) ( − ( ) ) √=

the de-meaned vector in standard units is • 1 0 avg 0 1 std 0 ( − ( ) ) ( )

Norm, distance, angle 2.9 Mean return and risk of investment

vectors represent time series of returns on an investment (as a percentage) • average value is (mean) return of the investment • standard deviation measures variation around the mean, i.e., risk • ak bk ck dk 10 10 10 10 5 5 5 5 0 k 0 k 0 k 0 k 5 5 5 5 − − − − (mean) return

3 c b 2 d 1 a

0 risk 0 1 2 3 4 5

Norm, distance, angle 2.10 Exercise show that avg 0 2 std 0 2 = rms 0 2 ( ) + ( ) ( )

Solution

0 avg 0 1 2 std 0 2 = k − ( ) k ( ) = ) ) ) 1  1 0   1 0  = 0 1 0 1 = − = − = ) 2 ) 2  ) 2 ! 1 ) 1 0 1 0 1 0 = 0 0 ( ) ( ) = = − = − = + =

 ) 2  1 ) 1 0 = 0 0 ( ) = − = = rms 0 2 avg 0 2 ( ) − ( )

Norm, distance, angle 2.11 Exercise: nearest scalar multiple

= given two vectors 0, 1 R , with 0 ≠ 0, find scalar multiple C0 closest to 1 ∈ b

line ta t R taˆ { | ∈ }

Solution

squared distance between C0 and 1 is • ) ) ) ) C0 1 2 = C0 1 C0 1 = C20 0 2C0 1 1 1 k − k ( − ) ( − ) − + ) a quadratic function of C with positive leading coefficient 0 0 derivative with respect to C is zero for • 0) 1 0) 1 Cˆ = = 0) 0 0 2 k k Norm, distance, angle 2.12 Exercise: average of collection of vectors

= given # vectors G ,..., G# R , find the =-vector I that minimizes 1 ∈

I G 2 I G 2 I G# 2 k − 1k + k − 2k + · · · + k − k

x4

x3

x5 z

x2 x1

I is also known as the centroid of the points G1,..., G#

Norm, distance, angle 2.13 Solution: sum of squared distances is

2 2 2 I G1 I G2 I G# k − k= + k − k + · · · + k − k X  2 2 2 = I8 G1 8 I8 G2 8 I8 G# 8 8=1 ( − ( ) ) + ( − ( ) ) + · · · + ( − ( ) ) = X  2 2 2 = #I8 2I8 G1 8 G2 8 G# 8 G1 8 G# 8 8=1 − (( ) + ( ) + · · · + ( ) ) + ( ) + · · · + ( ) here G 9 8 is 8th element of the vector G 9 ( )

term 8 in the sum is minimized by • 1 I8 = G 8 G 8 G# 8 # (( 1) + ( 2) + · · · + ( ) )

solution I is component-wise average of the points G ,..., G#: • 1 1 I = G G G# # ( 1 + 2 + · · · + )

Norm, distance, angle 2.14 Outline

norm • distance • :-means algorithm • angle • complex vectors • :-means clustering

a popular iterative algorithm for partitioning # vectors G1,..., G# in : clusters

Norm, distance, angle 2.15 Algorithm

choose initial ‘representatives’ I1,..., I: for the : groups and repeat:

1. assign each vector G8 to the nearest group representative I 9

2. set the representative I 9 to the mean of the vectors assigned to it

initial representatives are often chosen randomly • as a variation, choose a random initial partition and start with step 2 • solution depends on choice of initial representatives or partition • can be shown to converge in a finite number of iterations • in practice, often restarted a few times, with different starting points •

Norm, distance, angle 2.16 Example: first iteration

assignment to groups updated representatives

Norm, distance, angle 2.17 Iteration 2

assignment to groups updated representatives

Norm, distance, angle 2.18 Iteration 3

assignment to groups updated representatives

Norm, distance, angle 2.19 Iteration 11

assignment to groups updated representatives

Norm, distance, angle 2.20 Iteration 12

assignment to groups updated representatives

Norm, distance, angle 2.21 Iteration 13

assignment to groups updated representatives

Norm, distance, angle 2.22 Iteration 14

assignment to groups updated representatives

Norm, distance, angle 2.23 Final clustering

Norm, distance, angle 2.24 Image clustering

MNIST dataset of handwritten digits • # = 60000 grayscale images of size 28 28 (vectors G8 of size 282 = 784) • × 25 examples: •

Norm, distance, angle 2.25 Group representatives (: = 20 )

:-means algorithm, with : = 20 and randomly chosen initial partition • 20 group representatives •

Norm, distance, angle 2.26 Group representatives (: = 20 ) result for another initial partition

Norm, distance, angle 2.27 Document topic discovery

# = 500 Wikipedia articles, from weekly most popular lists (9/2015–6/2016) • dictionary of 4423 words • each article represented by a word histogram vector of size 4423 • result of :-means algorithm with : = 9 and randomly chosen initial partition •

Cluster 1

largest coefficients in cluster representative I • 1

word fight win event champion fighter . . . coefficient 0.038 0.022 0.019 0.015 0.015 . . .

documents in cluster 1 closest to representative • “Floyd Mayweather, Jr”, “Kimbo Slice”, “Ronda Rousey”, “José Aldo”, “Joe Frazier”, . . .

Norm, distance, angle 2.28 Cluster 2

largest coefficients in cluster representative I • 2

word holiday celebrate festival celebration calendar . . . coefficient 0.012 0.009 0.007 0.006 0.006 . . .

documents in cluster 2 closest to representative • “Halloween”, “Guy Fawkes Night”, “Diwali”, “Hannukah”, “Groundhog Day”, . . .

Cluster 3

largest coefficients in cluster representative I • 3

word united family party president government . . . coefficient 0.004 0.003 0.003 0.003 0.003 . . .

documents in cluster 3 closest to representative • “Mahatma Gandhi”, “Sigmund Freund”, “Carly Fiorina”, “Frederick Douglass”, “Marco Rubio”, . . .

Norm, distance, angle 2.29 Cluster 4

largest coefficients in cluster representative I • 4

word album release song music single . . . coefficient 0.031 0.016 0.015 0.014 0.011 . . .

documents in cluster 4 closest to representative • “David Bowie”, “Kanye West”, “Celine Dion”, “Kesha”, “Ariana Grande”, . . .

Cluster 5

largest coefficients in cluster representative I • 5

word game season team win player . . . coefficient 0.023 0.020 0.018 0.017 0.014 . . .

documents in cluster 5 closest to representative • “Kobe Bryant”, “Lamar Odom”, “Johan Cruyff”, “Yogi Berra”, “José Mourinho”, . . .

Norm, distance, angle 2.30 Cluster 6

largest coefficients in representative I • 6

word series season episode character film . . . coefficient 0.029 0.027 0.013 0.011 0.008 . . .

documents in cluster 6 closest to cluster representative • “The X-Files”, “Game of Thrones”, “House of Cards”, “Daredevil”, “Supergirl”, . . .

Cluster 7

largest coefficients in representative I • 7

word match win championship team event . . . coefficient 0.065 0.018 0.016 0.015 0.015 . . .

documents in cluster 7 closest to cluster representative • “Wrestlemania 32”, “ (2016)”, “ (2015)”, “ (2016)”, “Night of Champions (2015)”, . . .

Norm, distance, angle 2.31 Cluster 8

largest coefficients in representative I • 8

word film star role play series . . . coefficient 0.036 0.014 0.014 0.010 0.009 . . .

documents in cluster 8 closest to cluster representative • “Ben Affleck”, “Johnny Depp”, “Maureen O’Hara”, “Kate Beckinsale”, “Leonardo DiCaprio”, . . .

Cluster 9

largest coefficients in representative I • 9

word film million release star character . . . coefficient 0.061 0.019 0.013 0.010 0.006 . . .

documents in cluster 9 closest to cluster representative • “Star Wars: The Force Awakens”, “Star Wars Episode I: The Phantom Menace”, “The Martian (film)”, “The Revenant (2015 film)”, “The Hateful Eight”, . . .

Norm, distance, angle 2.32 Outline

norm • distance • :-means algorithm • angle • complex vectors • Angle between vectors the angle between nonzero real vectors 0, 1 is defined as

 0) 1  arccos 0 1 k k k k ) this is the unique value of \ 0, c that satisfies 0 1 = 0 1 cos \ • ∈ [ ] k kk k b

θ a

Cauchy–Schwarz inequality guarantees that • 0) 1 1 1 − ≤ 0 1 ≤ k k k k

Norm, distance, angle 2.33 Terminology

) \ = 0 0 1 = 0 1 vectors are aligned or parallel k kk k ) 0 \ < c 2 0 1 > 0 vectors make an acute angle ≤ / ) \ = c 2 0 1 = 0 vectors are orthogonal (0 1) / ⊥ ) c 2 < \ c 0 1 < 0 vectors make an obtuse angle / ≤ ) \ = c 0 1 = 0 1 vectors are anti-aligned or opposed −k kk k

Norm, distance, angle 2.34 Correlation coefficient the correlation coefficient between non-constant vectors 0, 1 is

) 0˜ 1˜ d01 = 0˜ 1˜ k k k k where 0˜ = 0 avg 0 1 and 1˜ = 1 avg 1 1 are the de-meaned vectors − ( ) − ( )

only defined when 0 and 1 are not constant (0˜ ≠ 0 and 1˜ ≠ 0) • d01 is the cosine of the angle between the de-meaned vectors • a number between 1 and 1 • − d01 is the average product of the deviations from the mean in standard units • = 1 X 08 avg 0 18 avg 1 d01 = ( − ( )) ( − ( )) = std 0 std 1 8=1 ( ) ( )

Norm, distance, angle 2.35 Examples

ak bk bk

ρab = 0.968

k k ak

ak bk bk

ρ = 0.988 ab −

k k ak

ak bk bk

ρab = 0.004

k k ak

Norm, distance, angle 2.36 Regression line

scatter plot shows two =-vectors 0, 1 as = points 0:, 1: • ( ) straight line shows affine function 5 G = 2 2 G with • ( ) 1 + 2

5 0: 1:,: = 1, . . . , = ( ) ≈

f x ( )

x

Norm, distance, angle 2.37 Least squares regression = 1 2 2 = X 5 0 1 2 use coefficients 1, 2 that minimize = : : :=1 ( ( ) − )

is a quadratic function of 2 and 2 : • 1 2 = 1 = X 2 2 0 1 2 = 1 2 : : :=1 ( + − )  )  = =22 2= avg 0 2 2 0 222 2= avg 1 2 20 12 1 2 = 1 + ( ) 1 2 + k k 2 − ( ) 1 − 2 + k k /

to minimize , set derivatives with respect to 2 , 2 to zero: • 1 2 ) 2 avg 0 2 = avg 1 , = avg 0 2 0 22 = 0 1 1 + ( ) 2 ( ) ( ) 1 + k k 2

solution is • ) 0 1 = avg 0 avg 1 22 = − ( ) ( ), 21 = avg 1 avg 0 22 0 2 = avg 0 2 ( ) − ( ) k k − ( )

Norm, distance, angle 2.38 Interpretation

slope 22 can be written in terms of correlation coefficient of 0 and 1:

) 0 avg 0 1 1 avg 1 1 std 1 22 = ( − ( ) ) ( − ( ) ) = d01 ( ) 0 avg 0 1 2 std 0 k − ( ) k ( )

hence, expression for regression line can be written as • d01 std 1 5 G = avg 1 ( ) G avg 0 ( ) ( ) + std 0 ( − ( )) ( )

correlation coefficient d01 is the slope after converting to standard units: • 5 G avg 1 G avg 0 ( ) − ( ) = d01 − ( ) std 1 std 0 ( ) ( )

Norm, distance, angle 2.39 Examples

d01 = 0.91 d01 = 0.89 d01 = 0.25 − dashed lines in top row show average standard deviation • ± bottom row shows scatter plots of top row in standard units •

Norm, distance, angle 2.40 Outline

norm • distance • :-means algorithm • angle • complex vectors • Norm

= norm of vector 0 C : ∈ q 0 = 0 2 0 2 0= 2 k k | 1| + | 2| + · · · + | | p = 00

positive definite: • 0 0 for all 0, 0 = 0 only if 0 = 0 k k ≥ k k

homogeneous: • V0 = V 0 for all vectors 0, complex scalars V k k | |k k

triangle inequality: • 0 1 0 1 for all vectors 0, 1 of equal size k + k ≤ k k + k k

Norm, distance, angle 2.41 Cauchy–Schwarz inequality for complex vectors

 = 0 1 0 1 for all 0, 1 C | | ≤ k kk k ∈

 moreover, equality 0 1 = 0 1 holds if: | | k kk k 0 = 0 or 1 = 0 • 0 ≠ 0 and 1 ≠ 0, and 1 = W0 for some (complex) scalar W •

exercise: generalize proof for real vectors on page 2.4 •  we say 0 and 1 are orthogonal if 0 1 = 0 • = we will not need definition of angle, correlation coefficient, . . . in C •

Norm, distance, angle 2.42