Wavelet Methods for Time Series Analysis Implications of Orthonormality: I

Part II: Introduction to the Discrete • WT W − let j• denote the jth row of , where j =0, 1,...,N 1 • W • will give precise definition of DWT in Part III note that j• itself is a column vector T • let W denote element of W in row j and column l • let X =[X0,X1,...,XN−1] be a vector of N time series j,l • W W values (note: ‘T ’ denotes ; i.e., X is a column vector) note that j,l is also lth element of j• • J • W W need to assume N =2 for some positive integer J (restrictive!) let’s consider two vectors, say, j• and k• • DWT is a linear transform of X yielding N DWT coefficients • orthonormality says • notation: W = WX N−1 1, when j = k, − Wj•, W •≡ W W = W is vector of DWT coefficients (jth component is Wj) k j,l k,l 0, when j = k − W is N × N orthonormal transform ; i.e., l=0 T − W • W W W = IN, where IN is N × N identity matrix j , k• is inner product of jth & kth rows T − W •2 ≡W•, W • is squared (energy) for W • • inverse of W is just its transpose, so WW = IN also j j j j

II–1 II–2

Implications of Orthonormality: II Implications of Orthonormality: III

• example from W of dimension 16 × 16 we’ll see later on • another example from same W − inner product of row 8 with itself (i.e., squared norm): − inner product of rows 8 and 12: . . W . . W . . 8,t ...... 8,t ...... W2 ...... sum = 1 . W W ...... sum = 0 . 8,t .. 8,t 12,t . W ...... W . . . . 8,t ... .. 12,t ......

− row 8 said to have ‘unit energy’ since squared norm is 1 − rows8&12said to be orthogonal since inner product is 0

II–3 II–4 The Haar DWT: I The Haar DWT: II

• like CWT, DWT tell us about variations in local averages • keep shifting by two to form rows until we come to . . . • to see this, let’s look inside W for the Haar DWT for N =2J N 1 1 T • row j = − 1: 0,...,0 , −√ , √ ≡W 2 2 2 N −1• • −√1 √1 ≡WT − 2 row j =0: 2, 2, 0,..., 0 0• N 2 zeros N−2 zeros • first N/2 rows form orthonormal set of N/2 vectors W 2 1 1 note: 0• = 2 + 2 = 1 & hence has required unit energy 0 ...... • row j =1: 0, 0, −√1 , √1 , 0,...,0 ≡WT . . 2 2 1• 1 ...... N−4 zeros 2 ...... • W W 3 ...... 0• and 1• are orthogonal N = 16 example ...... 4 . . 5 ...... W0,t ...... 6 ...... W W . . . . 0,t 1,t ...... sum = 0 7 ...... W1,t ...... 0 5 10 15 t

II–5 II–6

The Haar DWT: III The Haar DWT: IV √1 √1 • W W • to form next row, stretch − , , 0,...,0 out 1• and N • are orthogonal 2 2 2 by a factor of two and renormalize to preserve unit energy . N 1 1 1 1 T W • j = : − , − , , , 0,...,0 ≡W 1,t ...... 2 2 2 2 2 N • W W . − 2 .. 1,t 8,t ...... sum = 0 N 4 zeros W ...... W 2 1 1 1 1 8,t note: N • = 4 + 4 + 4 + 4 = 1, as required .. 2 • W W N 0• and N • are orthogonal ( 2 = 8 in example) • W W 2 2• and N • are orthogonal 2 . . W 0,t ...... W ...... W W . 2,t ... 0,t 8,t ...... sum = 0 . W W ...... W . .. 2,t 8,t sum = 0 8,t ...... W ...... 8,t ..

II–7 II–8 The Haar DWT: V The Haar DWT: VI • W • − 1 −1 1 1 form next row by shifting N • to right by 4 units to form next row, stretch 2, 2, 2, 2, 0,...,0 out 2 by a factor of two and renormalize to preserve unit energy • N −1 −1 1 1 ≡WT j = 2 +1: 0, 0, 0, 0, 2, 2, 2, 2, 0,..., 0 N • • 3N −√1 −√1 √1 √1 ≡WT +1 j = : ,..., , ,..., , 0,..., 0 3N N−8 zeros 2 4 8 8 8 8 • N−8 zeros 4 • W orthogonal to first N/2 rows and also to W 4 of these 4 of these N +1• N • W 2 · 1 2 2 note: 3N • =8 8 = 1, as required .. 4 W • 3N 3N 8,t ...... j = 4 + 1: shift row 4 to right by 8 units .. W W ...... 8,t 9,t sum = 0 • continue shifting and stretching until finally we come to . . . W9,t ...... .. • j = N − 2: −√1 ,...,−√1 , √1 ,...,√1 ≡WT N N N N N−2• • continue shifting by 4 units to form more rows, ending with . . . N of these N of these 2 2 • 3N − −1 −1 1 1 ≡WT √1 √1 T row j = 1: 0,..., 0 , , , , 3N • j = N − 1: ,..., ≡W 4 2 2 2 2 −1• N N N−1• N−4 zeros 4 N of these

II–9 II–10

The Haar DWT: VII Haar DWT Coefficients: I

• N = 16 example of Haar DWT matrix W • obtain Haar DWT coefficients W by premultiplying X by W: . .. W = WX 0 ...... 8 ...... 1 ...... 9 ...... • W ...... jth coefficient Wj is inner product of jth row j• and X: 2 ...... 10 ...... W • 3 ...... 11 ...... Wj = j , X 4 ...... 12 ...... • can interpret coefficients as difference of averages .... 5 ...... 13 ...... 6 ...... 14 ...... • to see this, let 7 ...... 15 ...... λ−1 0 5 10 15 0 5 10 15 1 X (λ) ≡ X − = ‘scale λ’ average t t t λ t l l=0 − note: Xt(1) = Xt = scale 1 ‘average’ − note: XN−1(N)=X = sample average

II–11 II–12 Haar DWT Coefficients: II Haar DWT Coefficients: III

• W consider form W0 = 0•, X takes in N = 16 example: • now consider form of WN = W8 = W8•, X: . 2 W .. 0,t ...... W8,t ...... W ∝ − . .. 0,tXt ...... sum X1(1) X0(1) .. . W X ...... sum ∝ X (2) − X (2) ...... 8,t t ... 3 1 Xt ... . Xt ...... • W similar interpretation for W1,...,WN − = W7 = 7•, X : • similar interpretation for WN ,...,W3N 2 1 +1 −1 . 2 4 W7,t ...... W ...... ∝ − . . . 7,tXt sum X15(1) X14(1) Xt ......

II–13 II–14

Haar DWT Coefficients: IV Haar DWT Coefficients: V

• final coefficient WN−1 = W15 has a different interpretation: • W3N = W12 = W8•, X takes the following form: 4 W ...... W .... 15,t 8,t ...... W .. ∝ .... W ... ∝ − .. 15,tXt ...... sum X15(16) .. 8,tXt ...... sum X7(4) X3(4) . . . . Xt ...... Xt ...... • structure of rows in W • continuing in this manner, come to W − = W14•, X: N 1 − N ∝ first 2 rows yield Wj’s changes on scale 1 − N ∝ W ...... next 4 rows yield Wj’s changes on scale 2 14,t ...... W ... .. ∝ − N .. 14,tXt ...... sum X15(8) X7(8) − next rows yield W ’s ∝ changes on scale 4 ...... 8 j Xt ... .. − ∝ N ... next to last row yields Wj change on scale 2 − last row yields Wj ∝ average on scale N

II–15 II–16 Structure of DWT Matrices Two Basic Decompositions Derivable from DWT

• N wavelet coefficients for scale τ ≡ 2j−1, j =1,...,J 2τj j • additive decomposition j−1 − τj ≡ 2 is standardized scale − reexpresses X as the sum of J + 1 new time series, each of − τj ∆t is physical scale, where ∆t is sampling interval which is associated with a particular scale τj − • each Wj localized in time: as scale ↑, localization ↓ called multiresolution analysis (MRA) − related to first ‘scary-looking’ CWT equation • rows of W for given scale τj: − circularly shifted with respect to each other • energy decomposition j − shift between adjacent rows is 2τj =2 − yields analysis of variance across J scales • similar structure for DWTs other than the Haar − called wavelet spectrum or wavelet variance − • differences of averages common theme for DWTs related to second ‘scary-looking’ CWT equation − simple differencing replaced by higher order differences − simple averages replaced by weighted averages

II–17 II–18

Partitioning of DWT Coefficient Vector W Example of Partitioning of W

• W decompositions are based on partitioning of W and • consider time series X of length N = 16 & its Haar DWT W • partition W into subvectors associated⎡ ⎤ with scale: W1 W1 W2 W3 W4 V4 ⎢ ⎥ 2 ⎢ W2 ⎥ . ⎢ . ⎥ . . ⎢ . ⎥ 0 ...... ⎢ ⎥ W . . . . W = ⎢ Wj ⎥ . ⎢ ⎥ − ⎢ . ⎥ 2 ⎣ ⎦ 2 WJ . . . V X 0 . . . . . J ...... j j−1 . • W has N/2 elements (scale τ =2 changes) − j  j 2 J N N N J 0 5 10 15 note: = + + ···+2+1=2 − 1=N − 1 j=1 2j 2 4 √ t • VJ has 1 element, which is equal to N · X (scale N average)

II–19 II–20 Partitioning of DWT Matrix W Example of Partitioning of W

• partition W commensurate with partitioning of W: • N = 16 example of Haar DWT matrix W ⎡ ⎤ W . 1 0 ...... 8 ...... ⎢ W ⎥ ...... ⎢ 2 ⎥ 1 ...... 9 ...... W ⎢ . ⎥ . .. 2 ⎢ . ⎥ 2 ...... 10 ...... W = ⎢ W ⎥ 3 ...... 11 ...... ⎢ j ⎥ W1 . ⎢ ⎥ 4 ...... 12 ...... ⎢ . ⎥ ...... W 5 ...... 13 ...... 3 ⎣ W ⎦ ...... J 6 ...... 14 ...... W ...... 4 V ...... V J 7 ...... 15 4 N j−1 0 5 10 15 0 5 10 15 • W is × N matrix (related to scale τ =2 changes) j 2j j t t √1 • two properties: (a) W = W X and (b) W WT = I • VJ is 1 × N row vector (each element is ) j j j j N N 2j

II–21 II–22

DWT Analysis and Synthesis Equations Multiresolution Analysis: I

• recall the DWT analysis equation W = WX • synthesis equation leads to additive decomposition: T • W W = IN because W is an orthonormal transform J J WT VT ≡ D S • implies that WT W = WT WX = X X = j Wj + J VJ j + J • j=1 j=1 yields DWT synthesis equation: ⎡ ⎤ • D ≡WT W is portion of synthesis due to scale τ W1 j j j j ⎢ ⎥   ⎢ W2 ⎥ • D is vector of length N and is called jth ‘detail’ ⎢ ⎥ j X = WT W = WT , WT ,...,WT , VT ⎢ . ⎥ 1 2 J J • S ≡VT V = X1, where 1 is a vector containing N ones ⎣ W ⎦ J J J J (later on we will call this the ‘smooth’ of Jth order) VJ J • additive decomposition called multiresolution analysis (MRA) WT VT = j Wj + J VJ j=1

II–23 II–24 Multiresolution Analysis: II Energy Preservation Property of DWT Coefficients

• example of MRA for time series of length N =16 • define ‘energy’ in X as its squared norm: N−1 S ...... 4  2 T 2 ...... X = X, X = X X = Xt D ...... 4 t=0 ...... D3 ...... (usually not really energy, but will use term as shorthand) ...... D ...... 2 ...... D • ...... 1 energy of X is preserved in its DWT coefficients W because .  2 T W T W 1 . . . W = W W =( X) X X 0 ...... T WT W − .... = X X 1 .. 0 5 10 15 = XT I X = XT X = X2 t N • adding values for, e.g., t =14inD1,...,D4 & S4 yields X14 • note: same argument holds for any orthonormal transform

II–25 II–26

Wavelet Spectrum (Variance Decomposition): I Wavelet Spectrum (Variance Decomposition): II  − • let X denote sample mean of X ’s: X ≡ 1 N 1 X t N t=0 t • define discrete wavelet power spectrum: • 2 letσ ˆX denote sample variance of Xt’s: 1 2 j−1 PX(τj) ≡ Wj , where τj =2 N−1   N−1 N 1 2 1 2 σˆ2 ≡ X − X = X2 − X • gives us a scale-based decomposition of the sample variance: X N t N t t=0 t=0 J 1 2 2 2 = X − X σˆX = PX(τj) N j=1 1 2 = W2 − X N • in addition, each W in W associated with a portion of X;  j,t j •  2 J  2  2 1  2 2 i.e., W 2 offers scale- & time-based decomposition ofσ ˆ2 since W = j=1 Wj + VJ and N VJ = X , j,t X J 1 σˆ2 = W 2 X N j j=1

II–27 II–28 Wavelet Spectrum (Variance Decomposition): III Summary of Qualitative Description of DWT

• wavelet spectra for time series X and Y of length N = 16, • DWT is expressed by an N × N orthonormal matrix W each with zero sample mean and same sample variance • transforms time series X into DWT coefficients W = WX 2 0. 3 . • each coefficient in W associated with a scale and location . . . . − . . . . . − j 1 X 0 . . . PX(τj) Wj is subvector of W with coefficients for scale τj =2 . . . . . − . . coefficients in Wj related to differences of averages over τj −2 0. 0 2 0. 3 . − last coefficient in W related to average over scale N ...... • Y 0 . . . . . P (τ ) orthonormality leads to basic scale-based decompositions . . . Y j . . . − . . multiresolution analysis (additive decomposition) −2 0. 0 0 5 10 15 1 2 48 − discrete wavelet power spectrum (analysis of variance)

tτj • stayed tuned for precise definition of DWT!

II–29 II–30