Computational Analysis ( Tutorial) Part II

Understanding Many Particle Systems with Machine Learning

Tutorials

Matthew Hirn Michigan State University Department of Computational Mathematics, Science & Engineering Department of Mathematics

2 Wavelet L (R)satisfies: • 2 – Zero average: =0 – Normalized: R =1 k k2 – Centered around t =0 – Localized in and – Can be either real or complex valued Wavelet Transform f(t) Wavelet dictionary obtained by scaling and • 2 translating : 1 1 t u 0 = u,s +, u,s(t)= t D { }u R,s R ps s 0 0.2 0.4 0.6 0.8 1 2 2 ✓ ◆ log2(s) −6 Wavelet transform: • −4 Wf(u, s)= f, u,s h + i = 1 f(t)s 1/2 (s 1(t u)) dt −2 Z1 = f ˜s(u) 0 u ⇤ 0 0.2 0.4 0.6 0.8 1 Fig. 4.7. A Wavelet Tour of Signal Processing, 3rd ed. Real wavelet transform Wf(u, s) computed with a Mexican hat wavelet The where vertical axis represents log2 s. Black, grey and white points correspond respectively to positive, zero and negative wavelet coecients. ˜ 1/2 1 s(t)=s (s t)

Note: • 0.25 ˜ ( )=p ˆ( ) s ! s s! 0.2 Thus, since: b 0.15 f\ ˜s(!)=fˆ(!) ˜ (!) 0.1 ⇤ s the wavelet transform Wfb(u, s) captures 0.05 the frequency information of f organized 0 by the frequency bands of ˜s. −2 0 2 Fig. 5.1. A Wavelet Tour of Signal Processing, 3rd ed. Scaled Fourier transforms ˆ(2j!) 2, for 1 j 5 and ! [ ⇡, ⇡]. | | 6 6 2 Real Wavelet Reconstruction Theorem (Calder´on,Grossman and Mor- • 2 let): Let L (R) be a real function such 2 that + ˆ 2 1 (!) C = | | d! < + Z0 ! 1 2 Then, for any f L (R): 2 + + 1 1 1 1/2 1 ds f(t)= Wf(u, s)s (s (t u)) du C 0 s2 Z Z1 + + 2 1 1 1 2 ds f 2 = Wf(u, s) du . k k C 0 | | s2 Z Z1 C < is called the wavelet admissibility • 1 condition.

C < + ˆ(0) = 0. This is almost • 1) sucient.

If additionally, ˆ C1, then C < + . • 2 1 Can insure this with sucient time decay: K (t) | |  1+ t 2+✏ | | Scaling Function

Numerically the wavelet transform is only • computed up to scales s

The scaling function captures this infor- 1 • 0.8 mation. Defined by: 0.6 0.5 + ds 0.4 ˆ(!) 2 = 1 ˆ(s!) 2 0 0.2 | | Z1 | | s 0 −0.5 −5 0 5 −5 0 5 Denote: 1.5 • 1.5 1 t (t)= and ˜ (t)= ( t) 1 s s s 1 ps ✓s◆ 0.5 0.5 The low frequency approximation of f at 0 0 • −5 0 5 −5 0 5 scale s is:

Fig. 4.6. A Wavelet Tour of Signal Processing, 3rd ed. Mexican hat wavelet for = 1 and its FourierFig. 4.8. transform. A Wavelet Tour of Signal Processing, 3rd ed. Scaling function associated to a Mexican hat wavelet and its .

ˆ ˆ Af(u, s)= f,u,s = f ˜s(u) h i ⇤ Reconstruction still holds: • 1 s0 ds 1 ( )= ( ) ( ) + ( ) ( ) f t Wf ,s s t 2 Af ,s0 s0 t C Z0 · ⇤ s C s0 · ⇤ Analytic Wavelets

Complex valued, analytic wavelets admit a • time-frequency analysis, like the windowed Fourier transform.

The wavelet is analytic if: • ! < 0, ˆ(!)=0 8 The wavelet transform Wf(u, s) of an an- • alytic wavelet satisfies very similar recon- struction and energy preservation formulas as the real wavelet transform. Analytic Wavelet Construction

ψ^ (ω) Let g be a real, symmetric window. • ^ Define a wavelet as: g(ω ) • (t)=g(t)ei⌘t ˆ(!)=ˆg(! ⌘) ) 0 η ω Fig. 4.10. A Wavelet Tour of Signal Processing, 3rd ed. Fourier transform ˆ(!) of a wavelet (t)=g(t) exp(i⌘t). Thus if ˆg(!)=0for ! > ⌘, then ˆ(!)=0 • | | for ! < 0, and is analytic.

is centered in time at t = 0 and in fre- • quency at ! = ⌘.

Gabor wavelets use a Gaussian window, and • so are not strictly analytic and do not have precisely zero average. However ˆ(!) 0 ⇡ for ! 0.  Morlet wavelets also use a Gaussian win- • dow, but subtract a constant in order to have zero average:

(t)=g(t)(ei⌘t C) Analytic Wavelet Heisenberg Boxes

Suppose is centered at t = 0 with central • frequency ! = ⌘. ω ^ 2 |ψ (ω)| The time variance t and frequency vari- u,s • 2 ance ! of are: + 2 1 2 2 η σω t = t (t) dt s s Z | | 11 + 2 1 2 ˆ 2 ! = (! ⌘) (!) d! 2⇡ Z0 | | s σt

^ s σ |ψu ,s (ω)| 0 t 0 0 Scalogram: η σω • s 2 s0 0 PW f(u, ⌘/s)= Wf(u, s) ψ ψu ,s | | u,s 0 0 0 uu0 t Fig. 4.9. A Wavelet Tour of Signal Processing, 3rd ed. Heisenberg boxes of two wavelets. Smaller scales decrease the time spread but increase the frequency support, which is shifted towards higher . Time-Frequency Plane: Wavelets vs. Windowed Fourier

Comparison of time-frequency tilings:

Windowed Fourier Transform Wavelet Transform Hyperbolic Revisited

f(t) 1 ↵ ↵ 0 f(t)=a cos 1 + a cos 2 1 1 t 2 2 t −1 • t ⇣ ⌘ ⇣ ⌘ 0 0.2 0.4 0.6 0.8 1

ξ / 2π 500

400

300 Spectrogram P f(u, ⇠) of windowed Fourier • S 200 transform 100

0 u 0 0.2 0.4 0.6 0.8 1

ξ / 2π 500 400 400

300300

200 Scalogram P f(u, ⌘/s) of analytic wavelet 200 • W 100 transform 100 0 u 0 0.2 0.4 0.6 0.8 1

0 rd u Fig. 4.14. A Wavelet Tour0 of Signal Processing,0.2 3 ed. Sum0.4 of two hyperbolic0.6 . (a): Spectrogram0.8 PSf(u,1⇠). (b): Ridge support calculated from the spectrogram

ξ / 2π

400

300

200

100

0 u 0 0.2 0.4 0.6 0.8 1

rd 1 Fig. 4.17. A Wavelet Tour of Signal Processing, 3 ed. (a): Normalized scalogram ⌘ ⇠PW f(u, ⇠) of two hyperbolic chirps. (b): Wavelet ridges. Hyperbolic Chirp Revisited

f(t) 1 ↵ ↵ 0 f(t)=a cos 1 + a cos 2 1 1 t 2 2 t −1 • t ⇣ ⌘ ⇣ ⌘ 0 0.2 0.4 0.6 0.8 1

ξ / 2π 500

400

300 Local maxima of spectrogram P f(u, ⇠) 200 • S 100

0 u 0 0.2 0.4 0.6 0.8 1

ξ / 2π 500

400

300

200 Local maxima of scalogram PW f(u, ⌘/s) 100 •

0 u 0 0.2 0.4 0.6 0.8 1

rd Fig. 4.14. A Wavelet Tour of Signal Processing, 3 ed. Sum of two hyperbolic chirps. (a): Spectrogram PSf(u, ⇠). (b): Ridge support calculated from the spectrogram Parallel Linear Chirps

f(t)

0.5 2 2 0 f(t)=a1 cos(bt + ct)+a2 cos(bt ) −0.5 • t 0 0.2 0.4 0.6 0.8 1

ξ / 2π ξ / 2π 500 400 400 300 300 200 200

100 100

0 u 0 u 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 ξ / 2π ξ / 2π 500 400 400 300 300

200 200

100 100

0 u 0 u 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1

rd 1 rd 2 Fig. 4.16. A Wavelet Tour of Signal Processing, 3 ed. (a): Normalized scalogram ⌘ ⇠PW f(u, ⇠) of two parallel linear chirps. (b): Fig. 4.13. A Wavelet Tour of Signal Processing, 3 ed. Sum of two parallel linear chirps. (a): Spectrogram PSf(u, ⇠)= Sf(u, ⇠) . (b): Wavelet ridges. Ridge support calculated from the spectrogram. | | Spectrogram: P f(u, ⇠) Scalogram: P f(u, ⌘/s) • S • W Sparsity and Time- Frequency Resolution

Lesson: Best transform depends on the • signal f time-frequency properties.

A transform that is adapted to the sig- • nal time-frequency property has fewer local maxima, and is thus sparser.

Transforms that are not adapted to the sig- • nal di↵use the signal’s energy over many atoms, leading to more local maxima and a less sparse representation.

Thus sparsity is a natural criterion to guide • the construction of time-frequency trans- forms. Wavelet Zoom

f(t)

2 1 0 t 0 0.2 0.4 0.6 0.8 1 log2(s) −6

−4

−2

0 u 0 0.2 0.4 0.6 0.8 1

Fig. 4.7. A Wavelet Tour of Signal Processing, 3rd ed. Real wavelet transform Wf(u, s) computed with a Mexican hat wavelet The vertical axis represents log2 s. Black, grey and white points correspond respectively to positive, zero and negative wavelet coecients. Taylor’s Theorem

We now turn to measuring the local regu- • larity of f at a point v.

Suppose f is m di↵erentiable in [v • h, v + h].

Let pv be the Taylor polynomial of f in the • neighborhood of v: m 1 (k) f (v) k pv(t)= (t v) k! kX=0

Taylor’s Theorem: The residual "v(t)= • f(t) pv(t)satisfies t [v h, v + h]: 8 2 m t v (m) "v(t) | | sup f (u) | |  m! u [v h,v+h] | | 2 Lipschitz Regularity

Lipschitz Regularity: A function f is point • wise Lipschitz (H¨older) ↵ 0 at v, if there exists K>0 and a polynomial pv of degree m = ↵ such that b c ↵ t R, f(t) pv(t) K t v 8 2 | |  | | f is uniformly Lipschitz ↵ over [a, b]ifit • satisfies the above for all v [a, b]witha 2 K independent of v.

Global Lipschitz regularity and the Fourier • transform: A function f is bounded and uniformly Lipschitz ↵ over R if: + 1 fˆ(!) (1 + ! ↵) d! < + | | | | 1 Z1 Wavelet Vanishing Moments

A wavelet has n vanishing moments if: • + 0 k

Wf(u, s)=W "v(u, s)

We are going to measure ↵ from Wf(u, s) , • | | with u close to v. Multiscale Differential Operator

f(t)

2 1 0 t 0 0.2 0.4 0.6 0.8 1 s 0.02 0.04

0.06 Wavelet transform Wf(u, s)withwavelet • 0.08 with one vanishing moment 0.1 – Black: positive 0.12 u 0 0.2 0.4 0.6 0.8 1 – White: negative

rd Fig. 6.1. A Wavelet Tour of Signal Processing, 3 ed. Wavelet transform Wf(u, s) calculated with = ✓0 where ✓ is a Gaussian, for the signal f shown above. The position parameter u and the scale s vary respectively along the horizontal and vertical axes. Black, grey and – Grey: zero white points correspond respectively to positive, zero and negative wavelet coecients. Singularities create large coecients in their cone of influence. Singularities create large amplitude wavelet • coecients

Notice that the coecients give informa- • tion regarding the derivative of f - this is not an accident! Multiscale Differential Operator

Theorem: A wavelet with a fast decay • has n vanishing moments if and only if there exists ✓ with a fast decay such that: dn✓(t) (t)=( 1)n dtn As a consequence: n n d Wf(u, s)=s (f ✓˜s)(u), dun ⇤ with 1/2 ✓˜s(t)=s ✓( t/s)

= ✓ 0

= ✓00 Wavelet Zoom on an Interval

Let Cn(R) have n vanishing moments • 2 and derivatives that have fast decay.

Theorem: • 2 – If f L (R) is uniformly Lipschitz ↵ n 2  over [a, b], then there exists A>0 such + that (u, s) [a, b] R , 8 2 ⇥ Wf(u, s) As↵+1/2 | |  – Conversely, suppose f is bounded and +1 2 + Wf(u, s) As↵ / (u, s) [a, b] R | |  8 2 ⇥ for an ↵ 0. Wavelet Zoom at a Point

Let Cn(R) have n vanishing moments • 2 and derivatives that have fast decay.

Theorem (Ja↵ard): • 2 – If f L (R) is Lipschitz ↵ n at v, then 2  there exists A>0 such that (u, s) + 8 2 R R , ⇥ u v ↵ Wf(u, s) As↵+1/2 1+ | |  ✓ s ◆ – Conversely, if ↵ 0 and ↵0 < ↵ such that (u, s) + 8 2 R R , ⇥ u v ↵0 Wf(u, s) As↵+1/2 1+ | |  s ! then f is Lipschitz ↵ at v. Wavelet Modulus Maxima

Previous two theorems show that the local • Lipschitz regularity of f at v depends on the decay of Wf(u, s) as s 0. | | ! In fact, we only need to look at the local • maxima of Wf(u, s) to detect and char- | | acterize singularities of f.

Wavelet modulus maximum is a point (u ,s ) • 0 0 such that Wf(u, s ) is locally maximum at | 0 | u = u0. Maxima Propagation

Wavelet modulus maxima • Theorem (Hwang, Mallat): f is singular • at a point v only if there is a sequence of wavelet modulus maxima (up,sp) that converges to v at fine scales:

lim (up,sp)=(v, 0) p + ! 1 Theorem (Hummel, Poggio, Yuille): If = • ( 1)✓(n) for ✓ a Gaussian, then the wavelet modulus maxima belong to connected curves that are not interrupted as s 0. ! The maximum slope of log Wf(u, s) as a • 2 | | function of log2 s along the maximum line converging to v is ↵ +1/2.

Full line: Decay of log Wf(u, s) along • 2 | | maxima line converging to t =0.05.

Dashed line: Maxima line converging to • { t =0.42. Dyadic Wavelet Transform and Maxima

Dyadic wavelet transform: • j Wf(u, 2 )=f ˜ j (u) ⇤ 2

Wavelet maxima (keeping the sign) • Wavelet Maxima Approximation in 1D

f(t) Analysis •

Approximation of • f(t) with 100% Synthesis wavelet maxima

Approximation of • f(t) with 50% wavelet maxima Wavelet Transform and Modulus Maxima in 2D

(a) Wavelet transform in Increasing Scale horizontal direction

(b) Wavelet transform in vertical direction

(c) Wavelet transform modulus

(d) Angles

(e) Wavelet modulus maxima Wavelet Transform and Modulus Maxima in 2D

Increasing Scale (a) Wavelet transform in horizontal direction

(b) Wavelet transform in vertical direction

(c) Wavelet transform modulus

(d) Angles

(e) Wavelet modulus maxima

(e) Wavelet modulus maxima above a threshold Wavelet Maxima Approximation in 2D

(a) Original Image

(b) Approximation from 100% wavelet maxima (e)

(c) Approximation from thresholded wavelet maxima (f) Dyadic Wavelet Frames Translation Invariant Frames Recall translation invariant dictionary: •

= u, ,u R, u,(t)=(t u) D { } 2 2 and the (frame) operator:

f(u, )= f,u, = f ˜(u), ˜(t)=( t) h i ⇤ A translation invariant dictionary is a frame • 2 for L (R) if there exists B A>0 such 2 that for all f L (R), 2 2 2 2 A f 2 f( , ) 2 B f 2 k k  k · k  k k X where + + 2 1 2 1 2 f( , ) = f(u, ) du = f ˜(u) du k · k2 | | | ⇤ | Z1 Z1 When A = B the frame is tight. • Frames are redundant. • Translation Invariant Frames

Theorem: If there exists B A>0 such • that for almost every ! R, 2 2 A ˆ(!) B,  | |  X then

= u, ,u R, u,(t)=(t u) D { } 2 2 2 is a frame for L (R).

Define the generators ' of the dual • { } frame via:

ˆ(!) '(!)= ˆ (!) 2 0 | 0 | b P We then have the following reconstruction • formula:

f(t)= f( , ) '(t)= f ˜ '(t) · ⇤ ⇤ ⇤ X X Dyadic 0.25 0.2 Wavelet Frame 0.15 0.1 A translation invariant dyadic wavelet dic- 0.05 • 0 tionary is defined as: −2 0 2 Fig. 5.1. A Wavelet Tour of Signal Processing, 3rd ed. Scaled Fourier transforms ˆ(2j!) 2, for 1 j 5 and ! [ ⇡, ⇡]. | | 6 6 2 j j 2 = j (t)=2 (2 (t u)) D u,2 u R,j Z n o 2 2 supp( j,✓) Dyadic wavelet transform: • j j j b Wf(u, 2 )=f ˜ j (u), ˜ j (t)=2 ( 2 t) ⇤ 2 2 1 Corollary: If there exists B A>0 such • that for all ! R 0 , 2 \{ } + 1 A ˆ(2j!) 2 B  j= | |  X1 then the dyadic wavelet dictionary is a frame. ✓ ✓

If A = B = 1, then reconstruction is par- • ticularly simple: j j + 1 ˜ f(t)= f 2j 2j (t) j= ⇤ ⇤ real( j,✓) imag( j,✓) X1 1D Wavelet Transform at Different Scales

j Wf(u, 2 )=f ˜ j (u) captures the details • ⇤ 2 of f at the scale 2j. 2D Wavelet Transform at Different Scales

Rotations ✓ Scales ⇢ (u) | ⇤ j,✓ | j

⇢(u)