<<

StructuralStructural BreakBreak DetectionDetection inin TimeTime SeriesSeries

Richard A. Davis Thomas Lee Gabriel Rodriguez-Yam

Colorado State University (http://www.stat.colostate.edu/~rdavis/lectures)

This research supported in part by an IBM faculty award.

1 ¾ Introduction y Background y Setup for AR models ¾ Using Minimum Description Length (MDL) y General principles y Application to AR models with breaks ¾ Optimization using Genetic Algorithms y Basics y Implementation for structural break estimation ¾ Simulation examples y Piecewise autoregressions y Slowly varying autoregressions ¾Applications y Multivariate y EEG example ¾Application to nonlinear models (parameter-driven state-space models) y Poisson model y Stochastic volatility model 2 Introduction

Structural breaks: Kitagawa and Akaike (1978) • fitting locally stationary autoregressive models using AIC • computations facilitated by the use of the Householder transformation Davis, Huang, and Yao (1995) • likelihood ratio test for testing a change in the parameters and/or order of an AR process. Kitagawa, Takanami, and Matsumoto (2001) • signal extraction in seismology-estimate the arrival time of a seismic signal. Ombao, Raz, von Sachs, and Malow (2001) • orthogonal complex-valued transforms that are localized in time and - smooth localized complex exponential (SLEX) transform. • applications to EEG time series and speech . 3 Introduction (cont)

Locally stationary: Dahlhaus (1997, 2000,…) • locally stationary processes • estimation Adak (1998) • piecewise stationary • applications to seismology and biomedical signal processing MDL and coding theory: Lee (2001, 2002) • estimation of discontinuous regression functions Hansen and Yu (2001) • model selection

4 Introduction (cont)

Time Series: y1, . . . , yn Piecewise AR model: Y = γ + φ Y + + φ Y + σ ε , if τ ≤ t < τ , t j j1 t−1 L jp j t− p j j t j-1 j where τ0 = 1 < τ1 < . . . < τm-1 < τm = n + 1, and {εt} is IID(0,1). Goal: Estimate

m = number of segments th τj = location of j break point th γj = level in j epoch th pj = order of AR process in j epoch (φ , ,φ ) = AR coefficients in jth epoch j1 K jp j th σj = scale in j epoch

5 Introduction (cont)

Motivation for using piecewise AR models: Piecewise AR is a special case of a piecewise (see Adak 1998), ~ m Y = Y j I (t / n), t,n ∑ t [τ j−1 ,τ j ) j=1 j where { Y t } , j = 1, . . . , m is a sequence of stationary processes. It is argued in

Ombao et al. (2001), that if {Yt,n} is a locally stationary process (in the sense of ~ Dahlhaus), then there exists a piecewise stationary process { Y t,n} with

mn → ∞ with mn / n → 0, as n → ∞, that approximates {Yt,n} (in average square).

Roughly speaking: {Yt,n} is a locally stationary process if it has a time-varying spectrum that is approximately |A(t/n,ω)|2 , where A(u,ω) is a continuous function in u.

6 Example--Monthly Deaths & Serious Injuries, UK

Data: yt = number of monthly deaths and serious injuries in UK, Jan `75 – Dec `84, (t = 1,…, 120) Remark: Seat belt legislation introduced in Feb `83 (t = 99). Counts 1200 1400 1600 1800 2000 2200

1976 1978 1980 1982 1984

Year

7 Example -- Monthly Deaths & Serious Injuries, UK (cont)

Data: xt = number of monthly deaths and serious injuries in UK, differenced at lag 12; Jan `75 – Dec `84, (t = 13,…, 120) Remark: Seat belt legislation introduced in Feb `83 (t = 99).

Traditional :

Ytt = a + bf (t) +Wtt , ⎧0, if 1 ≤ t ≤ 98, f(t) = ⎨ ⎩1, if 98 < t ≤ 120. Differenced Counts XXtt ==YYtt −−YYtt−−1212

==bgbg((tt))++NNtt ⎧⎧11,, ifif 9999≤≤tt ≤≤110110,, g(t)g(t)==

-600 -400 -200 0 200 ⎨⎨ 1976 1978 1980 1982 1984 ⎩⎩0,0, otherwiseotherwise.. Year

Model: b=−373.4, {Nt}~AR(13).

8 Model Selection Using Minimum Description Length

Basics of MDL: Choose the model which maximizes the compression of the data or, equivalently, select the model that minimizes the code length of the data (i.e., amount of memory required to encode the data).

M = class of operating models for y = (y1, . . . , yn)

LF (y) = code length of y relative to F ∈ M Typically, this term can be decomposed into two pieces (two-part code), ˆ ˆ LF ( y) = L(F|y) + L(eˆ | F), where L(Fˆ|y) = code length of the fitted model for F L(eˆ|Fˆ) = code length of the residuals based on the fitted model

9 Model Selection Using Minimum Description Length (cont)

Applied to the segmented AR model: Y = γ + φ Y + + φ Y + σ ε , if τ ≤ t < τ , t j j1 t−1 L jp j t− p j j t j-1 j First term ˆ : Let n = τ – τ and ψ = ( γ , φ , , φ , σ ) denote the L(F|y) j j j-1 j j j1 K jp j j length of the jth segment and the parameter vector of the jth AR process, respectively. Then ˆ L(F|y) = L(m) + L(τ1,K,τm ) + L( p1,K, pm ) + L(ψˆ 1 | y) +L+ L(ψˆ m | y)

= L(m) + L(n1,K,nm ) + L( p1,K, pm ) + L(ψˆ 1 | y) +L+ L(ψˆ m | y) Encoding:

integer I : log2 I bits (if I unbounded)

log2 IU bits (if I bounded by IU) ˆ ˆ MLE θ : ½ log2N bits (where N = number of observations used to compute θ ; Rissanen (1989))

10 Model Selection Using Minimum Description Length (cont)

So, m m p + 2 ˆ j L(F|y) = log2 m + mlog2 n + ∑log2 p j +∑ log2 n j j=1 j=1 2 Second term L ( eˆ | F ˆ ) : Using Shannon’s classical results on information theory, Rissanen demonstrates that the code length of e ˆ can be approximated by the negative of the log-likelihood of the fitted model, i.e., by

m n ˆ j 2 L(eˆ | F) ≈ ∑ (log2 (2πσˆ j ) +1) j=1 2

For fixed values of m, (τ1,p1),. . . , (τm,pm), we define the MDL as

MDL(m,(τ1, p1),K,(τm , pm )) m m m p j + 2 n j 2 n = log2 m + mlog2 n + ∑log2 p j +∑∑log2 n j + log2 (2πσˆ j ) + j=1 j==112 j 2 2

The strategy is to find the best segmentation that minimizes MDL(m,τ1,p1,…, τm,pm).

To speed things up, we use Y-W estimates of AR parameters. 11 Optimization Using Genetic Algorithms

Basics of GA: Class of optimization algorithms that mimic natural evolution.

• Start with an initial set of chromosomes, or population, of possible solutions to the optimization problem.

• Parent chromosomes are randomly selected (proportional to the rank of their objective function values), and produce offspring using crossover or mutation operations.

• After a sufficient number of offspring are produced to form a second generation, the process then restarts to produce a third generation.

• Based on Darwin’s theory of natural selection, the process should produce future generations that give a smaller (or larger) objective function.

12 Application to Structural Breaks—(cont)

Genetic Algorithm: Chromosome consists of n genes, each taking the value of −1 (no break) or p (order of AR process). Use natural selection to find a near optimal solution.

Map the break points with a chromosome c via

(m,(τ , p ) ,(τ , p )) ←⎯→ c = (δ , ,δ ), (m,(τ1, p1)K,(τm , pm )) ←⎯→ c = (δ1,K,δn ), where ⎧⎧−−11,, ifif nono breakbreak pointpoint atat tt,, δδtt == ⎨⎨ ⎩⎩ ppjj,, ifif breakbreak pointpoint atat timetime tt == ττjj−−11 andand ARAR orderorder isis ppjj.. For example, c = (2, -1, -1, -1, -1, 0, -1, -1, -1, -1, 0, -1, -1, -1, 3, -1, -1, -1, -1,-1) t: 1 6 11 15 would correspond to a process as follows:

AR(2), t=1:5; AR(0), t=6:10; AR(0), t=11:14; AR(3), t=15:20 13 Implementation of Genetic Algorithm—(cont)

Generation 0: Start with L (200) randomly generated chromosomes, c1, . . . ,cL with associated MDL values, MDL(c1), . . . , MDL(cL). Generation 1: A new child in the next generation is formed from the

chromosomes c1, . . . , cL of the previous generation as follows:

¾ with probability πc, crossover occurs.

ƒ two parent chromosomes ci and cj are selected at random with

probabilities proportional to the ranks of MDL(ci).

th ƒ k gene of child is δk = δi,k w.p. ½ and δj,k w.p. ½

¾ with probability 1− πc, mutation occurs.

ƒ a parent chromosome ci is selected

th ƒ k gene of child is δk = δi,k w.p. π1 ; −1 w.p. π2 ; and p w.p. 1− π1−π2. 14 Implementation of Genetic Algorithm—(cont)

Execution of GA: Run GA until convergence or until a maximum number of generations has been reached. . Various Strategies:

¾ include the top ten chromosomes from last generation in next generation.

¾ use multiple islands, in which populations run independently, and then allow migration after a fixed number of generations. This implementation is amenable to parallel computing.

15 Simulation Examples-based on Ombao et al. (2001) test cases

1. Piecewise stationary: Consider a time series following the model,

⎧ .9Yt−1 + εt , if 1 ≤ t < 513, ⎪ Yt = ⎨1.69Yt−1 − .81Yt−2 + εt , if 513 ≤ t < 769, ⎪ ⎩1.32Yt−1 − .81Yt−2 + εt , if 769 ≤ t ≤ 1024, where {εt} ~ IID N(0,1). -10 -5 0 5 10

1 200 400 600 800 1000

Time 16 1. Piecewise stat (cont)

Implementation: Start with NI = 50 islands, each with population size L = 200. After every Mi = 5 generations, allow migration. 4

Replace worst 2 in Island 3412 with best 2 from Island 2.3.4.1. 3 1

Stopping rule: Stop when the max MDL does not change for 10 consecutive 2 migrations or after 100 migrations.

Span configuration for model selection: Max AR order K = 10, p 0123456-10

mp 25 25 30 35 40 45 50

πp .08 .1 .1 .09 .09 .09 .09 17 1. Piecewise stat (cont)

GA results: 3 pieces with breaks at τ1=513 and τ2=769. Total run time 16.31 secs 2 Fitted model: φ1 φ2 σ 1- 512: .857 .9945 513- 768: 1.68 -0.801 1.1134 769-1024: 1.36 -0.801 1.1300 True Model Fitted Model 0.0 0.1 0.2 0.3 0.4 0.5 0.0 0.1 0.2 0.3 0.4 0.5

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Time Time 18 1. Piecewise stat (cont)

Simulation: 200 replicates of time series of length 1024 were generated. (SLEX results from Ombao et al.) nn 3232 ASE = n−−11(1/ 33) {log fˆˆ(t / n,ω ) −log f (t / n,ω )}22, ω = 2πj / 64. ASE = n (1/ 33)∑∑∑∑{log f (t / n,ωjj ) −log f (t / n,ωjj )} , ωjj = 2πj / 64. tt====11 jj 00 # of Auto-SLEX GA segments % Change Points ASE % mean std ASE 2 0 1/2 - 0 25.42 .500 .006 3.64 3 60.0 1/4 , 3/4 82.0 (4.56) .749 .006 (.13) .476 .080 34.09 3.73 4 34.0 1/4, 2/4, 3/4 17.5 .616 .110 (6.74) (.13) .761 .037 32.77 5 5.0 2/8, 4/8, 5/8, 6/8, 7/8 0 (5.20) 50.01 ≥ 6 1.0 0.5 3.83 (6.25) 19 1. Piecewise stat (cont)

Simulation (cont):

True model:

⎧ .9Yt−1 + εt , if 1 ≤ t < 513, ⎪ Yt = ⎨1.69Yt−1 − .81Yt−2 + εt , if 513 ≤ t < 769, ⎪ ⎩1.32Yt−1 − .81Yt−2 + εt , if 769 ≤ t ≤ 1024,

AR orders selected (percent): Order012345≥ 6

p1 0 99.4 0.60 0 0 0 0

p2 0086.0 11.6 1.8 0.6 0

p3 0089.0 10.4 0.6 0 0

20 Simulation Examples (cont)

2. Piecewise stationary:

⎧ .9Yt−1 + εt , if 1 ≤ t < 198 Yt = ⎨ ⎩− .9Yt−1 + εt , if 198 ≤ t ≤ 1024 where {εt} ~ IID N(0,1). -5 0 5

1 200 400 600 800 1000

Time 21 2. Piecewise stationary (cont)

GA results: 2 pieces with break at τ1=196. Total run time 11.96 secs 2 Fitted model: φ1 σ 1- 195: .872 1.081 196 - 1024: -.883 1.078

True Model Fitted Model 0.0 0.1 0.2 0.3 0.4 0.5 0.0 0.1 0.2 0.3 0.4 0.5

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Time Time

22 2. Piecewise stationary (cont)

Simulation: 200 replicates of time series of length 1024 were generated. AR(1) models with break at 196/1024 =.191.

# of Auto-SLEX # of change points segments % ASE segmnts % mean std ASE ≤ 4 0 - 1 0 1.81 .017 5 44.0 2 97.0 .192 .002 (.039) (.013) .220 .154 .023 .060 6 37.0 3 3.0 (.056) .323 .320 (.044)

.211 7 6.0 ≥ 4 0 (.047) .290 8 12.0 (.096) .255 ≥ 9 1.0 (.100) 23 Simulation Examples (cont)

3. Piecewise stationary:

⎧ .9Ytt−1 + εtt, if 1 ≤≤ t < 5151 Ytt = ⎨ ⎩.25Ytt−1 + εtt, if 50 ≤≤ t ≤ 500 where {εt} ~ IID N(0,1).

GA results: 2 pieces with break at τ1=47 -2 0 2

1 100 200 300 400 500

Time 24 3. Piecewise stationary (cont)

Simulation results: Change occurred at time τ1 = 51; 51/500=.1

# of change points segments % mean std

1 9.0 .

2 89.0 .096 .017

.048 .020 3 2.0 .092 .011

≥ 4 0

25 Simulation Examples (cont)

4. Slowly varying AR(2) model:

Yt = atYt−1 − .81Yt−2 + εt if 1 ≤ t ≤ 1024

where a t = . 8 [ 1 − 0 . 5 cos( π t / 1024 )], and {εt} ~ IID N(0,1). a_t -6 -4 -2 0 2 4 0.4 0.6 0.8 1.0 1.2

1 200 400 600 800 1000 0 200 400 600 800 1000 time Time 26 4. Slowly varying AR(2) (cont)

GA results: 3 pieces with breaks at τ1=293 and τ2=615. Total run time 27.45 secs 2 Fitted model: φ1 φ2 σ 1- 292: .365 -0.753 1.149 293- 614: .821 -0.790 1.176 615-1024: 1.084 -0.760 0.960

True Model Fitted Model 0.0 0.1 0.2 0.3 0.4 0.5 0.0 0.1 0.2 0.3 0.4 0.5

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Time Time 27 4. Slowly varying AR(2) (cont)

Simulation: 200 replicates of time series of length 1024 were generated.

# of Auto-SLEX # of change points segments % ASE segments % mean std ASE .238 ≤ 4 14.0 1 0 (.030) .228 .127 5 27.0 2 36.0 .502 .044 (.025) (.014)

.232 .258 .071 .080 6 35.0 3 63.0 (.029) .661 .075 (.016) .243 7 18.0 .309 (.033) 4 1.0 .550 .269 8 15.0 .860 (.040) ≥ 5 0 ≥ 9 1.0 .308 28 4. Slowly varying AR(2) (cont)

Simulation (cont):

True model: Yt = atYt−1 − .81Yt−2 + εt if 1 ≤ t ≤ 1024

AR orders selected (percent): (2 segment realizations) Order 0 1 2 3 4 ≥ 5

p1 0097.2 1.4 1.4 0

p2 0097.2 2.8 0 0

AR orders selected (percent): (3 segment realizations) Order01234≥ 5

p1 00100 00 0

p2 0098.4 1.6 0 0

p3 0097.6 1.6 0.8 0 29 4. Slowly varying AR(2) (cont)

In the graph below right, we average the spectogram over the GA fitted models generated from each of the 200 simulated realizations.

True Model Average Model Frequency 0.0 0.1 0.2 0.3 0.4 0.5 0.0 0.1 0.2 0.3 0.4 0.5

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Time Time

30 Example: Monthly Deaths & Serious Injuries, UK

Data: Yt = number of monthly deaths and serious injuries in UK, Jan `75 – Dec `84, (t = 1,…, 120) Remark: Seat belt legislation introduced in Feb `83 (t = 99). Counts 1200 1400 1600 1800 2000 2200

1976 1978 1980 1982 1984

Year

31 Example: Monthly Deaths & Serious Injuries, UK

Data: Yt = number of monthly deaths and serious injuries in UK, Jan `75 – Dec `84, (t = 1,…, 120) Remark: Seat belt legislation introduced in Feb `83 (t = 99). Differenced Counts -600 -400 -200 0 200 1976 1978 1980 1982 1984 Year Results from GA: 3 pieces; time = 4.4secs Piece 1: (t=1,…, 98) IID; Piece 2: (t=99,…108) IID; Piece 3: t=109,…,120 AR(1) 32 Application to Multivariate Time Series

Multivariate time series (d-dimensional): y1, . . . , yn Piecewise AR model: Y = γ + Φ Y + + Φ Y + Σ1/ 2Z , if τ ≤ t < τ , t j j1 t−1 L jp j t− p j j t j-1 j

where τ0 = 1 < τ1 < . . . < τm-1 < τm = n + 1, and {Ζt} is IID(0, Id). In this case,

m MDL(m,(τ1, p1),K,(τm , pm )) = logm + mlogn + ∑log p j j=1 m p d 2 + d + d(d +1) / 2 m τ j −1 j 1 ˆ ˆ T ˆ −1 ˆ + ∑∑∑logn j + ()log(|Vt |) + (Yt − Yt ) Vt (Yt − Yt ) , j==112 jt2 =τ j−1 ˆ ˆ ˆ 2 where Y t = E ( Y t | Y t , K Y 1 ) and V t = E ( Y t − Yt ) and the AR parameters are estimated by the multivariate Y-W equations based on Whittle’s generalization of the Durbin-Levinson algorithm. 33 Example: Bivariate Time Series

• {Yt1} same as the series in Example 2 (3 segments: AR(1), AR(3), AR(2))

• {Yt2} same as the series in Example 1 (2 segments: AR(1), AR(1)) GA results:

TS 1: 3 pieces with breaks at τ1=513 and τ2=769. Total run time 16.31 secs

TS 2: 2 pieces with break at τ1=196. Total run time 11.96 secs

Bivariate: 4 pieces with breaks at τ1=197, τ2=519, τ3=769: AR(1), AR(1), AR(2), AR(2) Total run time 1126 secs -5 0 5 -10 -5 0 5 10

1 200 400 600 800 1000 1 200 400 600 800 1000

Time Time 34 Example: EEG Time series Data: Bivariate EEG time series at channels T3 (left temporal) and P3 (left parietal). Female subject was diagnosed with left temporal lobe epilepsy. Data collected by Dr. Beth Malow and analyzed in Ombao et al (2001). (n=32,768; rate of 100H). Seizure started at about 1.85 seconds. GA univariatebivariate results: results: 11 14 piecesbreakpoints with AR for orders, T3; 11 17,breakpoints 2, 6 15, 2, for 3, 5,P3 9, 5, 4, 1

T3 Channel P3 Channel EEG T3 channel EEG P3 channel -400 -300 -200 -100 0 -600 -400 -200 0 200 1 50 100 150 200 250 300 1 50 100 150 200 250 300

Time in seconds Time in seconds 35 Example: EEG Time series (cont) Remarks: • the general conclusions of this analysis are similar to those reached in Ombao et al. • prior to seizure, power concentrated at lower frequencies and then spread to high frequencies. • power returned to the lower frequencies at conclusion of seizure.

T3 Channel P3 Channel Frequency (Hertz) Frequency (Hertz) 0 1020304050 0 1020304050

1 50 100 150 200 250 300 1 50 100 150 200 250 300 Time in seconds Time in seconds 36 Example: EEG Time series (cont) Remarks (cont): • T3 and P3 strongly coherent at 9-12 Hz prior to seizure. • strong coherence at low frequencies just after onset of seizure. • strong coherence shifted to high frequencies during the seizure.

T3/P3 Coherency Frequency (Hertz) Frequency 0 1020304050

1 50 100 150 200 250 300 37 Time in seconds Application to Structural Breaks (Davis, Lee, Rodriguez-Yam)

State Space Model Setup: Observation equation:

p(yt | αt) = exp{αt yt − b(αt) + c(yt)}.

State equation: {αt} follows the piecewise AR(1) model given by

αt = γk +φkαt-1 +σkεt , if τk-1 ≤ t < τk , where 1 = τ0 <τ1 <…< τm < n, and {εt } ~ IID N(0,1). Parameters: m = number of break points

τk = location of break points th γk = level in k epoch th φk = AR coefficients k epoch th σk = scale in k epoch 38 Application to Structural Breaks—(cont)

Estimation: For (m, τ1, . . . , τm) fixed, calculate the approximate likelihood evaluated at the “MLE”, i.e., 1/ 2 | Gn | T * T * * T * La (ψˆ ;yn ) = 1/ 2 exp{yn α −1 {b(α ) − c(yn )}− (α − µ) Gn (α − µ) / 2}, ()K + Gn ˆ ˆ 2 2 where ψ ˆ = ( γˆ 1 , K , γˆ m , φ 1 , K , φ m , σ ˆ 1 , K , σˆ m ) is the MLE.

Goal: Optimize an objective function over (m, τ1, . . . , τm). • use minimum description length (MDL) as an objective function

• use genetic algorithm for optimization

39 Application to Structural Breaks—(cont)

Minimum Description Length (MDL): Choose the model which maximizes the compression of the data or, equivalently, select the model that minimizes the code length of the data (i.e., amount of memory required to store the data).

Code Length(“data”) = CL(“fitted model”) + CL(“data | fitted model”)

~ CL(“parameters”) + CL(“residuals”)

MDLMDL((mm,,ττ11,,KK,,ττmm)) mm mm ==log(log(mm))++mmlog(log(nn))++11..55 log(log(ττ −−ττ ))−− loglog((LL ((ψψˆˆ ;;yy )))) ∑∑ jj jj−−11 ∑∑ aa jj ττj−j−11:τ:τj j jj==11 jj==11 114242444444444444 4343444444444444 1142424444 43434444 CLCL("("ParametersParameters"")) CLCL("("residualsresiduals"")) Generalization: AR(p) segments can have unknown order.

MDLMDL((mm,,((ττ11,,pp11),),KK,,((ττmm,,ppmm)))) mm mm ==log(log(mm))++mmlog(log(nn))++00..55 ((pp ++22))log(log(ττ −−ττ ))−− loglog((LL ((ψψˆˆ ;;yy )))) ∑∑ jj jj jj−−11 ∑∑ aa jj ττj−j−11:τ:τjj jj==11 jj==11

40 Example

2 Model: Yt | αt ∼ Pois(exp{β+ αt }), αt = φαt-1+ εt , {εt}~IID N(0, σ )

blue =AL red = IS y MDL 1002 1004 1006 1008 1010 1012 1014 0 5 10 15

1 100 200 300 400 500 1 100 200 300 400 500

time Breaking Point True model:

ƒ Yt | αt ∼ Pois(exp{.7 + αt }), αt = .5αt-1+ εt , {εt}~IID N(0, .3), t < 250

ƒ Yt | αt ∼ Pois(exp{.7 + αt }), αt = −.5αt-1+ εt , {εt}~IID N(0, .3), t > 250. ƒ GA estimate 251, time 267secs 41 SV Process Example

2 Model: Yt | αt ∼ N(0,exp{αt}), αt = γ + φ αt-1+ εt , {εt}~IID N(0, σ ) y MDL -2 -1 0 1 2 3 1295 1300 1305 1310 1315

1 500 1000 1500 2000 2500 3000 1 500 1000 1500 2000 2500 3000

time Breaking Point

True model:

ƒ Yt | αt ∼ N(0, exp{αt}), αt = -.05 + .975αt-1+ εt , {εt}~IID N(0, .05), t ≤ 750

ƒ Yt | αt ∼ N(0, exp{αt }), αt = -.25 +.900αt-1+ εt , {εt}~IID N(0, .25), t > 750. ƒ GA estimate 754, time 1053 secs 42 SV Process Example

2 Model: Yt | αt ∼ N(0,exp{αt}), αt = γ + φ αt-1+ εt , {εt}~IID N(0, σ ) y MDL -0.5 0.0 0.5 -530 -525 -520 -515 -510 -505 -500

1 100 200 300 400 500 1 100 200 300 400 500

time Breaking Point True model:

ƒ Yt | αt ∼ N(0, exp{αt}), αt = -.175 + .977αt-1+ εt , {εt}~IID N(0, .1810), t ≤ 250

ƒ Yt | αt ∼ N(0, exp{αt }), αt = -.010 +.996αt-1+ εt , {εt}~IID N(0, .0089), t > 250. ƒ GA estimate 251, time 269s 43 SV Process Example-(cont) True model:

ƒ Yt | αt ∼ N(0, exp{αt}), αt = -.175 + .977αt-1+ εt , {εt}~IID N(0, .1810), t ≤ 250

ƒ Yt | αt ∼ N(0, exp{αt }), αt = -.010 +.996αt-1+ εt , {εt}~IID N(0, .0089), t > 250. Fitted model based on no structural break:

ƒ Yt | αt ∼ N(0, exp{αt}), αt = -.0645 + .9889αt-1+ εt , {εt}~IID N(0, .0935)

original series simulated series y y -0.5 0.0 0.5 -0.5 0.0 0.5 1.0

1 100 200 300 400 500 1 100 200 300 400 500

time time 44 SV Process Example-(cont) Fitted model based on no structural break:

ƒ Yt | αt ∼ N(0, exp{αt}), αt = -.0645 + .9889αt-1+ εt , {εt}~IID N(0, .0935)

simulated series y MDL -0.5 0.0 0.5 1.0 -478 -476 -474 -472 -470 -468 -466

1 100 200 300 400 500 1 100 200 300 400 500

time Breaking Point

45 Summary Remarks

1. MDL appears to be a good criterion for detecting structural breaks. 2. Optimization using a genetic algorithm is well suited to find a near optimal value of MDL. 3. This procedure extends easily to multivariate problems. 4. While estimating structural breaks for nonlinear time series models is more challenging, this paradigm of using MDL together GA holds promise for break detection in parameter-driven models and other nonlinear models.

46