Lecture III: Review of Classic Quadratic Variation Results and Relevance to Statistical Inference in Finance

Christopher P. Calderon Rice University / Numerica Corporation Research Scientist

Chris Calderon, PASI, Lecture 3 Outline I Refresher on Some Unique Properties of II Stochastic Integration and Quadratic Variation of SDEs III Demonstration of How Results Help Understand ”Realized Variation” Discretization Error and Idea Behind “Two Scale Realized Volatility Estimator”

Chris Calderon, PASI, Lecture 3 Part I

Refresher on Some Unique Properties of Brownian Motion

Chris Calderon, PASI, Lecture 3 Outline For Items I & II Draw Heavily from: Protter, P. (2004) Stochastic Integration and Differential Equations, Springer-Verlag, Berlin. Steele, J.M. (2001) and Financial Applications, Springer, New York.

For Item III Highlight Results from:

Barndorff-Nielsen, O. & Shepard, N. (2003) Bernoulli 9 243-265. Zhang, L.. Mykland, P. & Ait-Sahalia,Y. (2005) JASA 100 1394-1411.

Chris Calderon, PASI, Lecture 3 Assuming Some Basic Familiarity with Brownian Motion (B.M.)

• Stationary Independent Increments • Increments are Gaussian Bt Bs (0,t s) • Paths are Continuous but “Rough”− / “Jittery”∼ N − (Not Classically Differentiable) • Paths are of Infinite Variation so “Funny” Integrals Used in Stochastic Integration, e.g. t 1 2 BsdBs = 2 (Bt t) 0 − R Chris Calderon, PASI, Lecture 3 Assuming Some Basic Familiarity with Brownian Motion (B.M.)

The Material in Parts I and II are “Classic” Fundamental & Established Results but Set the Stage to Understand Basics in Part III.

The References Listed at the Beginning Provide a Detailed Mathematical Account of the Material I Summarize Briefly Here.

Chris Calderon, PASI, Lecture 3 A Classic Result Worth Reflecting On N 2 lim (Bti Bti 1 ) = T N − →∞ i=1 − T Implies Something Not t = i Intuitive to Many Unfamiliar i N with Brownian Motion …

Chris Calderon, PASI, Lecture 3 Discretely Sampled Brownian Motion Paths 2

1.8 Fix Terminal Time, “T” and Refine Mesh by 1.6 Increasing “N”.

1.4 t Z

Shift IC to 1.2 Zero if “Standard 1 Brownian Motion” Desired 0.8

0 1 t Chris Calderon, PASI, Lecture 3 Discretely Sampled Brownian Motion Paths 2 Fix Terminal Time, “T” δ T 1.8 N and Refine Mesh by ≡ T Increasing “N”. ti i N 1.6 ≡ Suppose You are Interested in Limit of: 1.4

t k(t) Z 2 1.2 (Bti Bti 1 ) − 1 i=1 −

k(t)=max i : ti+1 t 0.8 { ≤ }

0 1 t Chris Calderon, PASI, Lecture 3 Different Paths, Same Limit k(t) 2 lim (Bti Bti 1 ) = t 2 N − − →∞ i=1 1.8 [ProofP Sketched on Board] 1.6 B (ω ) 1.4 t 1 t Z 1.2

θˆ1 1 Bt(ω2) 0.8

0 1 t Chris Calderon, PASI, Lecture 3 Different Paths, Same Limit k(t) 2 lim (Bti Bti 1 ) = t 2 N − − →∞ i=1 1.8 [ProofP Sketched on Board] 1.6 B (ω ) Compare To 1.4 t 1 t

Z Central Limit 1.2 Theorem for “Nice” θˆ1 1 iid Random Bt(ω2) Variables. 0.8

0 1 t Chris Calderon, PASI, Lecture 3 Different Paths, Same Limit N N 2 lim (Bti Bti 1 ) lim Yi = T N − − ≡ N →∞2 i=1 →∞ i=1 1.8 P Compare/Contrast to ProperlyP Centered and 1.6 Scaled Sum of “Nice” iid Random Variables: N B (ω ) 1.4 t 1 (θ(ωi) μ) t

Z i=1 − SN 1.2 ≡ P √N θˆ1 1 Bt(ω2) 2 SN L (0, σ ) 0.8 → N

0 1 t Chris Calderon, PASI, Lecture 3 Continuity in Time N N 2 lim (Bti Bti 1 ) lim Yi = T N − − ≡ N →∞2 i=1 →∞ i=1 1.8 P Above is “Degenerate” but NoteP Time Scaling Used in Our Previous Example Makes Infinite 1.6 Sum Along a Path Seem Similar to 1.4 Computing Variance of Independent RV’s t

Z N 1.2 (θ(ωi) μ) − ˆ i=1 θ1 SN 1 ≡ P √N 0.8 2 SN L (0, σ ) 0 1 → N t Chris Calderon, PASI, Lecture 3 B.M. Paths Do Not Have Finite Variation Let 0

Chris Calderon, PASI, Lecture 3 B.M. Paths Do Not Have Finite Variation

[0,t](ω)=sup Bti+1 Bti V π ti π | − | ∈P ∈ Suppose That: ( [0,t] < ) > 0 Then: N P V ∞ 2 t =lim (Bti Bti 1 ) N − − N →∞ i=1 lim Psup Bti Bti 1 Bti Bti 1 − − ≤ N ti πN | − | i=1 | − | →∞ ∈ Chris Calderon, PASI, Lecture 3 P B.M. Paths Do Not Have Finite Variation N 2 t =lim (Bti Bti 1 ) N i=1 − − →∞ N P lim sup Bti Bti 1 Bti Bti 1 − − ≤ N ti πN | − | i=1 | − | →∞ ∈ P t > 0, so lim sup Bti Bti 1 [0,t] − ≤ N ti πN | − |V Probability of →∞ ∈ Tends to Zero Due to Finite Variation =0 a.s. uniform continuity of Must be Zero B.M. Paths

Chris Calderon, PASI, Lecture 3 Part II

Stochastic Integration and Quadratic Variation of SDEs

Chris Calderon, PASI, Lecture 3 Stochastic Integration

( [0,t] < )=0 P tV ∞ t

X(ω,t)= b(ω,s)ds + σ(ω,s)dBs 0 0 “Finite Variation Term” “Infinite Variation Term”

Chris Calderon, PASI, Lecture 3 Stochastic Integration

( [0,t] < )=0 P tV ∞ t

X(ω,t)= b(ω,s)ds + σ(ω,s)dBs 0 0 “Infinite Variation Term” Complicates Interpretting Limit as Lebesgue Integral σ Xt (Bt ) Bt Bt i i | i+1 − i | Chris Calderon, PASI, Lecture 3 Stochastic Differential Equations (SDEs) t t

X(ω,t)= b(ω,s)ds + σ(ω,s)dBs 0 0

Written in Abbreviated Form: dX(ω,t)=b(ω,t)dt + σ(ω,t)dBt

Chris Calderon, PASI, Lecture 3 Stochastic Differential Equations Driven by B.M. dX(ω,t)=b(ω,t)dt + σ(ω,t)dBt

Adapted and Measurable Processes

dXt = btdt + σtdBt I will use this shorthand for the above (which is itself shorthand for stochastic integrals…)

Chris Calderon, PASI, Lecture 3 Stochastic Differential Equations Driven by B.M.

dXt = btdt + σtdBt General Ito Process

dXt = b(Xt,t)dt + σ(Xt,t)dBt

dXt =0dt +1dBt Repackaging Brownian Motion

Chris Calderon, PASI, Lecture 3 Quadratic Variation for General SDEs (Jump Processes Readily Handled)

N 2 RVπN (t) (Xti Xti 1 ) ≡ i=1 − − “RealizedP Variation”

Quadratic Variation: Defined if the limit taken over any partition exists with the mesh going to zero.

:= Vt { }

Chris Calderon, PASI, Lecture 3 Stochastic Integration t t

Y (ω,t)= b0(ω,s)ds + σ0(ω,s)dXs 0 0

“Finite Variation Term” “Infinite Variation Term”

Quadratic Variation of “X” Can Be Used to Define Other SDEs (A “Good Stochastic Integrator”)

Chris Calderon, PASI, Lecture 3 Quadratic Variation Comes Entirely From Stochastic Integral t t

X(ω,t)= b(ω,s)ds + σ(ω,s)dBs 0 0

i.e. the drift (or term in front of “ds” makes) no contribution to the quadratic variation [quick sketch of proof on board]

Chris Calderon, PASI, Lecture 3 Stochastic Integration

( [0,t] < )=0 P tV ∞ t

X(ω,t)= b(ω,s)ds + σ(ω,s)dBs 0 0 “Infinite Variation Term” Can Now Be Dealt With If One Uses Q.V. σ Xt (Bt ) Bt Bt i i | i+1 − i | Chris Calderon, PASI, Lecture 3 Quadratic Variation and “Volatility” t t

Xt = bsds + σsdBs 0 0

Typical Notation for QV t 2 [X, X]t Vt = σ dt ≡ s 0

Chris Calderon, PASI, Lecture 3 Quadratic Variation and “Volatility”

For Diffusions t 2 [X, X]t Vt = σ dt ≡ s 0 More Generally for t (Includes Jump Processes) 2 [X, X]t := X 2 X dXs t − s− 0

Chris Calderon, PASI, Lecture 3 Part III

Demonstration of How Results Help Understand ”Realized Variation” Discretization Error and Idea Behind “Two Scale Realized Volatility Estimator”

Chris Calderon, PASI, Lecture 3 Most People and Institutions Don’t Sample Functions (Discrete Observations)

N 2 RVπN (t) (Xti Xti 1 ) ≡ i=1 − − Assuming Process is Pa Genuine Diffusion, How Far is Finite N Approximation From Limit?

[X, X]t

Chris Calderon, PASI, Lecture 3 Discretization Errors and Their Limit Distributions

t 1/2 L 4 N RVπN (t) [X, X]t σt (0,C σs ds) − | → N 0 ¡ ¢ R Paper Below Derived Explicit Expressions for Variance for Fairly General SDEs Driven by B.M.

Barndorff-Nielsen, O. & Shepard, N. (2003) Bernoulli 9 243-265.

Chris Calderon, PASI, Lecture 3 Discretization Errors and Their Limit Distributions

t 1/2 L 4 N RVπN (t) [X, X]t σt (0,C σs ds) − | → N 0 ¡ ¢ R Their paper goes over some nice classic moment generating function results for wide class of spot volatility processes (and proves things beyond QV)

Barndorff-Nielsen, O. & Shepard, N. (2003) Bernoulli 9 243-265.

Chris Calderon, PASI, Lecture 3 Discretization Errors and Their Limit Distributions δ t One Condition That They Assume: ≡ N N 1/2 2 2 lim δ σt +δ σt =0 N 0 i i → i=1 | − |

Barndorff-Nielsen, O. & Shepard, N. (2003) Bernoulli 9 243-265.

Chris Calderon, PASI, Lecture 3 Discretization Errors and Their

Limit Distributions t δ N Good for Many Stochastic Volatility Models … BUT ≡ Some Exceptions are Easy to Find (See Board) N 1/2 2 2 lim δ σt +δ σt =0 N 0 i i → i=1 | − |

Barndorff-Nielsen, O. & Shepard, N. (2003) Bernoulli 9 243-265.

Chris Calderon, PASI, Lecture 3 What Happens When One Has to Deal with “Imperfect” Data?

Suppose “Price” Process Is Not Directly Observed But Instead One Has:

yti = Xt1 + ²i

Zhang, L.. Mykland, P. & Ait-Sahalia,Y. (2005) JASA 100 1394-1411.

Chris Calderon, PASI, Lecture 3 What Happens When One Has to Deal with “Imperfect” Data?

Due to Fact that Real World Data Does Not Adhere to Strict Mathematical Definition of a Diffusion

yti = Xt1 + ²i

Zhang, L.. Mykland, P. & Ait-Sahalia,Y. (2005) JASA 100 1394-1411.

Chris Calderon, PASI, Lecture 3 What Happens When One Has to Deal with “Imperfect” Data?

yti = Xt1 + ²i

If “observation noise” sequence is i.i.d. (and independent of X) and it is very easy to see problem of estimating quadratic variation (see demostration on board).

Zhang, L.. Mykland, P. & Ait-Sahalia,Y. (2005) JASA 100 1394-1411.

Chris Calderon, PASI, Lecture 3 What Happens When One Has to Deal with “Imperfect” Data?

yti = Xt1 + ²i

If “observation noise” sequence is i.i.d. (and independent of X) and it is very easy to see problem of estimating quadratic variation (see demostration on board). Surprisingly These Stringent Assumption Model Many Real World Data Sets (e.g. See Results from Lecture 1)

Zhang, L.. Mykland, P. & Ait-Sahalia,Y. (2005) JASA 100 1394-1411.

Chris Calderon, PASI, Lecture 3 Simple Case “Bias” Estimate

Compute Realized Variation of Y and Divide by 2N

Then Obtain (Biased) Estimate by Subsampling

Finally Aggregate Subsample Estimates and Remove Bias

Zhang, L.. Mykland, P. & Ait-Sahalia,Y. (2005) JASA 100 1394-1411.

Chris Calderon, PASI, Lecture 3 Simple Case “Bias Corrected” Estimate

(avg) (all) [\X, X] =[Y,Y ] Ns [Y,Y ] T T − N T

Result Based on “Subsampling”

Zhang, L.. Mykland, P. & Ait-Sahalia,Y. (2005) JASA 100 1394-1411.

Chris Calderon, PASI, Lecture 3 Chasing A (Time Series) Dream Obtaining a “Consistent” Estimate with Asymptotic Error Bounds

T 1/6 2 4 N ([\X, X]T [X, X]T ) σt L 0,C1(VAR(²)) + C2 σs ds | → N 0 ³ R ´ Result Above Valid for Simple Observation Noise but Results (by Authors Below) Have Been Extended to Situations More General Than iid Case.

Zhang, L.. Mykland, P. & Ait-Sahalia,Y. (2005) JASA 100 1394-1411.

Chris Calderon, PASI, Lecture 3 DNA Melting

Force Force

Tensions Important in Fundamental Life Processes Such As: DNA Repair and DNA Transcription Single-Molecule Experiments Not Just a Neat Toy: They Have Provided Insights Bulk Methods Cannot

Chris Calderon, PASI, Lecture 3 Time Dependent Diffusions Stochastic Differential Equation (SDE): dzt = b(zt,t; Θ)dt + σ(zt; Θ)dWt

“Measurement” Noise yti = zti + ²ti For Frequently Sampled Single-Molecule Experimental Data, “Physically Uninteresting Measurement Noise” Can Be Large Component of Signal. “Transition Density” / Conditional Probability Find/Approximate: p(zt, ys; Θ) | Density Chris Calderon, PASI, Lecture 3 A Word from the Wise (Lucien Le Cam) • Basic Principle 0: Don’t trust any principle • Principle 1: Have clear in your mind what it is you want to estimate … • Principle 4: If satisfied that everything is in order, try first a crude but reliable procedure to locate the general area your parameters lie. • Principle 5: Having localized yourself by (4), refine the estimate using some of your theoretical assumptions, being careful all the while not to undo what you did in (4). …

Le Cam L.. (1990) International Statistics Review 58 153-171.

Chris Calderon, PASI, Lecture 3 Example of Relevance of Going Beyond iid Setting

Watching Process at Too Fine of Resolution Allows “Messy” Details to Be Statistically Detectable

Calderon, J. Chem. Phys. (2007).

Chris Calderon, PASI, Lecture 3 Example of Relevance of Going Beyond iid Setting

θ r Xn EP i [Xn Xn 1] − ≡tt+dt − |

r

t+dt X -E[X |X ]

Calderon, J. Chem. Phys. (2007).

Chris Calderon, PASI, Lecture 3 Lecture IV: A Selected Review of Some Recent Goodness-of-Fit Tests Applicable to Levy Processes

Christopher P. Calderon Rice University / Numerica Corporation Research Scientist

Chris Calderon, PASI, Lecture 3 Outline I Compare iid Goodness of Fit Problem to Time Series Case II Review a Classic Result Often Attributed to Rosenblatt (The Probability Integral Transform) III Highlight Hong and Li’s “Omnibus” Test IV Demonstrate Utility and Discuss Some Other Recent Approaches and Open Problems

Chris Calderon, PASI, Lecture 3 Outline

Primary References (Item II & III):

Diebold, F. Hahn J., & Tay, A. (1999) Review of Economics and Statistics 81 661-663. Hong, Y. and Li, H. (2005) Review of Financial Studies 18 37-84.

Chris Calderon, PASI, Lecture 3 Goodness-of-Fit Issues .

0.4

0.35

0.3

0.25

0.2 p(x) 0.15 True and 0.1 Parametric 0.05 Density

0 −5 0 5 x

Chris Calderon, PASI, Lecture 3 Goodness-of-Fit in Random Variable Case (No Time Evolution)

1

0.9

0.8

0.7 0.4

0.35 0.6 0.3

(x) 0.5 0.25 P 0.2 0.4 p(x) 0.15

0.3 0.1

0.2 0.05 0 −5 0 5 0.1 x

0 −5 0 5 x

Chris Calderon, PASI, Lecture 3 Goodness-of-Fit Issues

0.4 0.4

0.35 0.35

0.3 0.3

0.25 0.25

0.2 p(x) 0.2 p(x) 0.15 0.15

0.1 0.1

0.05 0.05

0 0 −5 0 5 −5 0 5 10 15 x x

Chris Calderon, PASI, Lecture 3 Goodness-of-Fit in Random Variable

0.4 Case (No Time0.4 Evolution) 0.35 0.35

0.3 0.3

0.25 0.25

0.2 0.2 p(x) p(x)

0.15 0.15

0.1 0.1

0.05 0.05

0 0 −5 0 5 −5 0 5 10 15 x x 1 1 0.9 0.9 0.8 0.8

0.7 0.7

0.6 0.6

(x) 0.5 (x) 0.5 P P 0.4 0.4

0.3 0.3

0.2 0.2

0.1 0.1

0 0 −5 0 5 −5 0 5 10 15 x x

Chris Calderon, PASI, Lecture 3 Goodness-of-Fit in Time Series

1 0.9 Note: 0.8 P1 F1 Truth and approximate 0.7 distribution changing with 0.6 P F (known) time index

(x) 0.5 2 2 P 0.4 0.3 Time series only get “one” 0.2 sample from each 0.1 evolving distribution 0 −5 0 5 10 15 x

Chris Calderon, PASI, Lecture 3 Time Series: Residual and Extensions

θˆ ri Xi E [Xi Xi 1] ≡ − | −

Problem with Using Low Order Correlation Shown In Lecture 2.

Can we Make Use of All Moments?

Chris Calderon, PASI, Lecture 3 Time Series: Residual and Extensions

θˆ ri Xi E [Xi Xi 1] ≡ − | −

Problem with Using Low Order Correlation Shown In Lecture 2.

Can we Make Use of All Moments?

YES. With the Probability Integral Xi Transform (PIT) Zi = p(X Xi 1; θ0)dX | − Also Called Generalized Residuals −∞R Chris Calderon, PASI, Lecture 3 Test the Model Against Observed Data:

X1,X2,...XN ˆ } θ {Time Series (Noisy, Correlated, and Possibly Nonstationary)

Zi = G(Xi; Xi 1, θˆ) Probability Integral Transform− / Rosenblatt Transform /“Generalized Residual” If Assumed Model Correct, Residuals i.i.d. with Known Distribution:: Formulate Test Statistic::

Zi U[0, 1] Q = H( Z1,Z2,...ZT ) ∼ { } Chris Calderon, PASI, Lecture 3 Simple Illustration of PIT [Specialized to Markov Dynamics] Start time series random variables characterized by : Xn (Xn Xn 1) ∼ P | − Introduce (strictly increasing) transformation: Zn = h(Xn; θ)

Transformation introduces “new” R.V. => New distribution Zn (Zn Zn 1; θ) ∼ F | − d (Xn Xn 1 ) “Truth” density in “X” (NOTE: no parameter) p(Xn Xn 1) P | − | − ≡ dXn Xn PIT transformation: Zn h(Xn):= f(X0 Xn 1; θ)dX0 ≡ | − Want expression for “new” density in −∞terms of “Z” R Monotone d (Zn Zn 1;θ) d (Xn Xn 1) dXn Function of X F | − = P | − dZn dXn dZn

d (Xn Xn 1) 1 1 ? = P | − dZ = p(Xn Xn 1) =1 dXn n f(Xn Xn 1;θ) dXn | − | −

Chris Calderon, PASI, Lecture 3 1 0 2 Qˆ(j) h(N j)Mˆ1(j) hA /V ≡ − − 0 ³ Location (Depends on ´ Bandwidth) Scale 1 1 2 Mˆ1(j) gˆj(z1,z2) 1 dz1dz2. ≡ − Z0 Z0 ³ ´ 1 N gˆj(z1,z2) Kh(z1, Zˆτ )Kh(z2, Zˆτ j). ≡ N j − τ=j+1 − X

Chris Calderon, PASI, Lecture 3 1 0 2 Qˆ(j) h(N j)Mˆ1(j) hA /V ≡ − − 0 ³ ´

1 1 2 Mˆ1(j) gˆj(z1,z2) 1 dz1dz2. ≡ − Z0 Z0 ³ ´ 1 N gˆj(z1,z2) Kh(z1, Zˆτ )Kh(z2, Zˆτ j). ≡ N j − τ=j+1 − X

Chris Calderon, PASI, Lecture 3 Stationary Time Series [No Time Dependent Forcing] −0.8 −0.9 −1 −1.1

y −1.2 −1.3 −1.4 −1.5 −1.6 −1.7 0 0.2 0.4 0.6 0.8 1 Time [ns]

Chris Calderon, PASI, Lecture 3 Stationary Time Series? −0.8 −0.9 −1 −1.1

y −1.2 −1.3 −1.4 −1.5 −1.6 −1.7 0 0.2 0.4 0.6 0.8 1 Time [ns]

Chris Calderon, PASI, Lecture 3 Stationary Time Series?

−0.8 −0.9 −1 −1.1

y −1.2 −1.3 −1.4 −1.5 −1.6 −1.7 0 0.2 0.4 0.6 0.8 1 Time [ns]

Chris Calderon, PASI, Lecture 3 Dynamic Rules “Changing” Over Time −0.8 (Simple 1D SDE Parameters “Evolving” ) −0.9 −1 −1.1

y −1.2 −1.3 −1.4 −1.5 −1.6 −1.7 0 0.2 0.4 0.6 0.8 1 Time [ns]

Chris Calderon, PASI, Lecture 3 “Subtle” Coupling Calderon, J. Phys. Chem. B (2010).

Chris Calderon, PASI, Lecture 3 Estimate “Simple” SDE in Window. Then Apply −0.8 Hypothesis Tests To Quantify −0.9 Time of Validity −1 −1.1

y −1.2 −1.3 −1.4 −1.5

−1.6 dΦt =(A + BΦt)dt +(C + DΦt)dWt −1.7 0 0.2 0.4 0.6 0.8 1 Time [ns]

Chris Calderon, PASI, Lecture 3 Estimate Model in Window “0”; Keep Fixed and Apply to Other Windows

Chris Calderon, PASI, Lecture 3 Window 0 Window 2 Window 6 Window 8

−0.8 −0.9 −1 −1.1

y −1.2 −1.3 −1.4 −1.5 −1.6 −1.7 0 0.2 0.4 0.6 0.8 1 Time [ns]

Chris Calderon, PASI, Lecture 3 Each Test Statistic Uses ONE Short Path Calderon, J. Phys. Chem. B (2010).

Chris Calderon, PASI, Lecture 3 Stationary Time Series? Subtle Noise Magnitude Changes due Unresolved Degrees of Freedom (Check Validity of a “Born-Oppenheimer” Type Proxy)

dΦt =(A + BΦt)dt +(C + DΦt)dWt

Chris Calderon, PASI, Lecture 3 Stationary Time Series? Subtle Noise Magnitude Changes due Unresolved Degrees of Freedom (Check Validity of a “Born-Oppenheimer” Type Proxy) Evolving “X” Influences Higher Order Moments of “Y”

Chris Calderon, PASI, Lecture 3 Stationary Time Series? Subtle Noise Magnitude Changes due Unresolved Degrees of Freedom (Check Validity of a “Born-Oppenheimer” Type Proxy)

Chris Calderon, PASI, Lecture 3 Estimate “Simple” SDE in Local Time Window. Then Apply Hypothesis Tests To Quantify Time of Validity

Chris Calderon, PASI, Lecture 3 Window 0 Window 2 Window 6 Window 8

Chris Calderon, PASI, Lecture 3 “Subtle” Noise Magnitude Changes Calderon, J. Phys. Chem. B (2010).

Chris Calderon, PASI, Lecture 3 Frequentist vs. Bayesian Approaches

• Concept of “Residual” Unnatural in Bayesian Setting. All Information Condensed into Posterior (Time Residual Can Help in Non-ergodic Settings) • Uncertainty Information Available in Both Under Ideal Models with Heavy Computation (Bayesian Methods Need “Prior” Specification) • Test Against Many Alternatives in a Frequentist Setting (Instead of a Handful of Candidate Models as is Done in Bayesian Methods)

Chris Calderon, PASI, Lecture 3 Several Issues

Evaluate Z at Estimated Parameter (Their Limit Distribution Requires Root N Consistent Estimator to Get [Asymptotic ] Critical Values) Asymptopia is a Strange Place [Multiplying by Infinity Can Be Dangerous When Numerical Proxies are Involved] In Nonstationary Case, Initial Condition Distribution May Be Important but Often Unknown.

Chris Calderon, PASI, Lecture 3 Testing Surrogate Models Hong & Li, Rev. Financial Studies, 18 (2005). Chen, S, Leung, D. & Qin,J., Annals of Statistics, 36 (2008). Ait-Sahalia, Fan, Peng, JASA (in press) Calderon & Arora, J. Chemical Theory & Computation 5 (2009). ( a) 1 0.9 0.45 ps 0.8 0.7 ∆t =0.30ps 0.6 0.5 F(Q) 0.4 OD Δ t=0.15 ps ∆t =0Δ .15ps OD t=0.30 ps 0.3 OD Δ t=0.45 ps OU Δ t=0.15 ps 0.2 OU Δ t=0.30 ps OU Δ t=0.45 ps 0.1 Null 0 −4 −2 0 2 4 6 8 1 0 Q One Times Series Reduces to One Test Statistic. Empirical CDF Summarizes Results from 100 Time Series

Chris Calderon, PASI, Lecture 3 A Sketch of Local to Global

or

Fitting Nonlinear SDEs Without A Priori Knolwedge of Global Parametric Model

Chris Calderon, PASI, Lecture 3 Local Time Dependent Diffusion Models Calderon, J. Chem. Phys. 126 (2007).

dzt = b(zt,t; Θ)dt + σ(zt; Θ)dWt

Approximate Locally by Low Order Polynomials, e.g. b(z) A + Bz + f(t) ≈ σ(z) C + Dz ≈ Can use Physics-based Proxies e.g. Overdamped Langevin Dynamics

σ(z) C + Dz ≈ 2 b(z) (C + Dz) /(2kBT )(A + Bz + f(t)) ≈

Chris Calderon, PASI, Lecture 3 Maximum Likelihood

For given model & discrete observations, find the maximum likelihood estimate:

ˆ θ maxθ p(z0,...,zT ; θ) ≡ Special case of Markovian Dynamics: ˆ θ maxθ p(z0; Θ)p(z1 z0; θ) ...p(zT zT 1; θ) ≡ | | − “Transition Density” (aka Conditional Probability Density)

Chris Calderon, PASI, Lecture 3 After Estimating Acceptable (Not Rejected) Local Models, What’s Next?

• Quality of Data: “Variance” (MC Simulation in Local Windows)

• Then Global Fit (Nonlinear Regression)

Chris Calderon, PASI, Lecture 3 Note: Noise Magnitude Depends on State And the State is Evolving in a Non-Stationary Fashion

2

1.5 State Dependent

1 Noise Would “Fool” Wavelet 0.5

Z (State Space) Z (State Type Methods

0 0 0.25 0.5 0.75 1 Time dZt = k(vpullt Zt)dt + σ(Zt)dWt − Chris Calderon, PASI, Lecture 3 Note: Noise Magnitude Depends on State And the State is Evolving in a Non-Stationary Fashion 2 2

1.5 1.5 )

Z 1 1 ( σ 0.5 0.5 Z (State Space) Z (State

0 0 0 0.25 0.5 0.75 1 0 0.5 1 1.5 2 Z (State Space) Time

Chris Calderon, PASI, Lecture 3 Note: Noise Magnitude Depends on State And the State is Evolving in a Non-Stationary Fashion 2 2

1.5 1.5 )

Z 1 1 ( σ 0.5 0.5 Z (State Space) Z (State

0 0 0 0.25 0.5 0.75 1 0 0.5 1 1.5 2 Z (State Space) Time Local Time Window

Chris Calderon, PASI, Lecture 3 Corresponding State Window

2 2

1.5 1.5

1 1

0.5 0.5 Z (State Space) Z (State

0 0 0.25 0.5 0.75 1 0 2 1.5 1 0.5 0 Time Local Time Windowσ(Z)

Chris Calderon, PASI, Lecture 3 Global Nonlinear Function of Interest (the “truth”)

2 2

1.5 1.5

1 1

0.5 0.5 Z (State Space) Z (State

0 0 0.25 0.5 0.75 1 0 2 1.5 1 0.5 0 Time σ(Z)

Chris Calderon, PASI, Lecture 3 Point Estimate Comes From Local Parametric Model (Parametric Likelihood Inference Tools Available)

2 2

1.5 1.5

1 1

0.5 0.5 Z (State Space) Z (State

0 0 0.25 0.5 0.75 1 0 2 1.5 1 0.5 0 Time σ(Z)

Chris Calderon, PASI, Lecture 3 Point Estimate Comes From Local Parametric Model (Parametric Likelihood Inference Tools Available) σ(z) C + D(z z0)

2 ≈2 −

1.5 1.5

1 1 Cˆ

0.5 0.5 Z (State Space) Z (State

0 0 0.25 0.5 0.75 1 0 2 1.5 1 0.5 0 Time σ(Z)

Chris Calderon, PASI, Lecture 3 Maximum Likelihood

For given model & discrete observations, find the maximum likelihood estimate:

ˆ θ maxθ p(z0,...,zT ; θ) ≡ Special case of Markovian Dynamics: ˆ θ maxθ p(z0; Θ)p(z1 z0; θ) ...p(zT zT 1; θ) ≡ | | − “Transition Density” (aka Conditional Probability Density)

Chris Calderon, PASI, Lecture 3 Noisy Point Estimates (finite discrete time series sample uncertainty) σ(z) C + D(z z0) ≈ −

2 2 2

1.5 1.5 1.5

1 o Z 1 Cˆ 1 Dˆ

0.5 0.5 0.5 Z (State Space) Z (State

0 0 0.25 0.5 0.75 1 0 0 2 1.5 1 0.5 0 3 2.5 2 1.5 1 0.5 0 σ(Z) ∂σ(Z)/∂Z Time Spatial Derivative of Function of Interest

Chris Calderon, PASI, Lecture 3 Idea Similar in spirit to J. Fan, Y. Fan, J. Jiang, JASA 102 (2007) except uses fully parametric MLE (no local kernels) in discrete local state windows

2 2

1.5 1.5 From Path to Point 1 1 Cˆ

0.5 0.5 Z (State Space) Z (State

0 0 0.25 0.5 0.75 1 0 2 1.5 1 0.5 0 Time σ(Z)

Chris Calderon, PASI, Lecture 3 HOWEVER: Not Willing To Assume Stationarity in Windows. (Completely Nonparametric and Fourier Analysis Problematic). “Subject Specific Function Variability” of Interest

2 2

1.5 1.5

1 1

0.5 0.5 Z (State Space) Z (State

0 0 0.25 0.5 0.75 1 0 2 1.5 1 0.5 0 Validity of Local Models Can Be Tested Using Generalized Residuals Available Using Transition σ(Z) Density with Local Parametric Approach Chris Calderon, PASI, Lecture 3 2 .5 Function of Interest 2 Noisy Point

) Estimates 1 .5

Z (Obtained Using Local ( 1 Maximum Likelihood) σ 0 .5 Transform 0 0 0.5 1 1.5 2 0 “X vs t” Data to

0 .25 “X vs Y=F(X)” 0.5 Estimate Time 0 .75 Nonlinear o Function via 1 0 0.5 1 Z 1.5 2 Z (State Space) Regression

Chris Calderon, PASI, Lecture 3 3 Spatial Derivative of Function of 2 .5 Noisy Point

Z Interest

∂ 2 Estimates / (Obtained Using Local ) 1 .5 Maximum Likelihood) Z 1 (

σ 0 .5 Transform ∂ 00 0.5 1 1.5 2 0 “X vs t” Data to

0 .25 “X vs Y’=G(X)” 0.5 Estimate Time 0 .75 Nonlinear o Function via 1 0 0.5 1 Z 1.5 2 Z (State Space) Regression

Chris Calderon, PASI, Lecture 3 2 .5 ˆ ˆ M 2 zm, C,D m=1 ) 1 .5 { } Z ( 1 σ (3) (2) 0 .5 Use Noisy Diffusive Path 00 0.5 1 1.5 2 0 to Infer 0 .25 Regression (1) 0.5 Data Time 0 .75

1 0 0.5 1 1.5 2 Z (State Space)

Chris Calderon, PASI, Lecture 3 Same Ideas Apply to Drift Function:

M zm, A,ˆ Bˆ { }m=1

Can Also Entertain Simultaneous Regression: M zm, A,ˆ B,ˆ C,ˆ Dˆ { }m=1

Chris Calderon, PASI, Lecture 3 2.5 Spline 2 Function of Interest Function

) 1.5 Estimate

Z

( 1 σ dσ(z) M zm, σ(zm), dz zm m=1 0.5 { | }

0 Variability of Between 0 0.5 1 1.5 2 0 Different Functions Gives Information About 0 .25 Unresolved Degrees of 0.5 Freedom Reliable Time 0 .75 Means for Patching Windowed Estimates 10 0.5 1 1.5 2 Together is Desirable Z (State Space)

Chris Calderon, PASI, Lecture 3 Open Mathematical Questions for Local SDE Inference

How Can One Efficiently Choose “Optimal” Local Window Sizes” ?

Hypothesis Tests Checking Markovian Assumption with Non-Stationary Data? (“Omnibus” goodness-of-fit tests of Hong and Li [2005] based on probability integral transform currently employed)

Chris Calderon, PASI, Lecture 3 Gramicidin A: Commonly Studied Benchmark 36,727 Atom Molecular Dynamics Simulation

Potassium Explicitly Ion Modeled Water Molecules Explicitly Modeled Lipid Gramicidin A Molecules Protein

Compute Work: Integralz of “Force time Distance”

Chris Calderon, PASI, Lecture 3 PMF and Non-equilibrium Work 50 SDE Nonergodic Sampling: Molecular Trajectory Dynamics 40 “Tube i” Pulling “Too Fast” Trajectory “i” Variability Induced by 30 Conformation Initial Condition (variability 20 between curves) W [kT]

10 and

“Thermal” Noise (Solvent 0 Bombardment, Vibrational Motion, etc. quantified by tube width) −10 0 0.2 0.4 0.6 0.8 1 τ [ns]

Chris Calderon, PASI, Lecture 3 PMF and Non-equilibrium Work 50 SDE Molecular Trajectory Dynamics 40 “Tube i” Distribution at Trajectory “i” Final Time is 30 non-Gaussian and can be 20 viewed as W [kT] MIXTURE of 10 distributions

0

−10 0 0.2 0.4 0.6 0.8 1 τ [ns]

Chris Calderon, PASI, Lecture 3 Importance of Tube Variability Mixture of Distributions

Conformational Noise

Thermal Noise

Chris Calderon, PASI, Lecture 3 Other Kinetic Issues Diffusion Coefficient / Effective Friction State Dependent

NOTE: Noise Intensity Differs in the Two Sets of Trajectories

Chris Calderon, PASI, Lecture 3 PMF and Non-equilibrium Work Molecular Dynamics Molecular Trajectory “Tube i” Dynamics SDE Trajectory “j” Trajectory 60 [each tube starts with same 60 “Tube j” q’s but different p’s] 50 50

40 40 W[kT]

W[kT] 30 30

20 20

10 10 3.00 2.25 1.50 0.75 0.00 3.00 2.25 1.50 0.75 0 λ[A]˚ λ[A]˚

Chris Calderon, PASI, Lecture 3 2 Pathwise View Use one pathEnsemble, discretely Averaging sampled at many time points, to obtain a 1.8 collectionProblematicof θˆs (one for each Due path) to ˆ Nonergodic / 1.6 θ3 Nonstationary Sampling θˆ 1.4 5 Collection of Models t ˆ Z θ2 Another Way of 1.2 θˆ4 Accounting for “Memory” θˆ 1 1 ˆ 0.8 Vertical lines reprentθ observation times i 0 0.2 0.4 0.6 0.8 1 t

Chris Calderon, PASI, Lecture 3 Potential of Mean Force Surface Hummer & Szabo, PNAS 98 (2001). Minh & Adib, PRL 100 (2008).

Several issues arise when one attempts to use this surface to approximate dynamics in complex systems.

Chris Calderon, PASI, Lecture 3 PMF and Non-equilibrium Work 50 SDE Molecular Trajectory Dynamics 40 “Tube i” Distribution at Trajectory “i” Final Time is 30 non-Gaussian and can be 20 viewed as W [kT] MIXTURE of 10 distributions

0

−10 0 0.2 0.4 0.6 0.8 1 τ [ns]

Chris Calderon, PASI, Lecture 3 PMF of a “Good” Coordinate Calderon, Janosi & Kosztin, J. Chem. Phys. 130 (2009).

Allen et al. (Ref. 27) PMF and Bastug et al. (Ref. 28) 25 FRPMFs (Ref. 16) Confidence Band FRComputed (w/ SPA Data) SPA 20 Computed Using Using Umbrella 10 Stretching & Sampling 15 Relaxing Noneq [kT]

U all-atom MD 10 Paths

5

0

−15 −10 −5 0 5 10 15 z[A]˚

Chris Calderon, PASI, Lecture 3 Penalized Splines (See Ruppert, Wand, Carroll’s Text: Semiparametric Regression, Cambridge University Press, 2003.) y = f(x1),...,f(xm), ∂f(x1),...,∂f(xm) + ², { Observed or Inferred Data } K Observation Error p yi = η0 + η1xi ...ηpxi + ζj Bj (xi), j=1 X Spline Basis with K<

Chris Calderon, PASI, Lecture 3 Penalized Splines (See Ruppert, Wand, Carroll’s Text: Semiparametric Regression, Cambridge University Press, 2003.) y = f(x1),...,f(xm), ∂f(x1),...,∂f(xm) + ², { } Observed or Inferred Data K p yi = η0 + η1xi ...ηpxi + ζj Bj (xi), j=1 Spline Basis with K<

Chris Calderon, PASI, Lecture 3 Penalized Splines Using Derivative Information

p p p 1 x1 ...x (κ1 x1) ...(κK x1) 1 − + − + . . . ⎛. . . ⎞ p p p PuDI 1 xm ...xm (κ1 xm)+ ...(κK xm)+ C := ⎜ p 1 − p 1 − p 1 ⎟ . ⎜01...px − p(κ1 x1) − ...p(κK x1) − ⎟ ⎜ 1 − + − + ⎟ ⎜. . . ⎟ ⎜. . . ⎟ ⎜ p 1 p 1 p 1⎟ ⎜01...px p(κ x ) − ...p(κ x ) − ⎟ ⎜ m− 1 m + K m + ⎟ ⎝ − − ⎠

Poorly Conditioned Design Matrix

Chris Calderon, PASI, Lecture 3 Pick Smoothness: Solve Many Least Squares Problems y Cβ 2 + α Dβ 2 k − k2 k k2 P-Spline Problem (Flexible Penalty) Choose α with Cost Function (GCV). This object has nice “mixed model” interpretation as ratio of observation variance / random effects variance α Selection Requires Traces and Residuals for Each Candidate

Chris Calderon, PASI, Lecture 3 PSQR Penalized Splines using QR (Calderon, Martinez, Carroll, Sorensen) Allows Fast and Numerically Stable P-spline Results. Exactly Rank Deficient “C” Can Be Treated Can Process Many Batches of Curves (Facilitates Solving Many GCV Type Optimization Problems). Data Mining without User Intervention (e.g. Closely Spaced and/or Overlapping Knots Are Not Problematic) Let Regularization Be Handled By Built In Smoothness Penalty (Do Not Introduce Extra Numerical Regularization Steps).

Chris Calderon, PASI, Lecture 3 Demmler Reinsch Basis Approach (Popular Method For Efficiently Computing Smoothing Splines)

Forms Eigenbasis using Cholesky Factor of CT C

Traces and Residuals for Each Candidate α Can Easily be Obtained in Vectorized Fashion

Squaring a Matrix is Not a Good Idea (Subtle and Dramatic Consequences)

Chris Calderon, PASI, Lecture 3 Avoid ad hoc Regularization

Analogous Existing Methods Do Not Use Good Numerics e.g. if C T C Cannot be Cholesky Factored, ad hoc solution: Find Factor of CT C + ηD Instead and Solving that Penalized Regression Problem Originally Posed Others use SVD truncation in Combination with Regularization (Again Not Problem Posed)

Chris Calderon, PASI, Lecture 3 Basic Sketch of Our Approach

(1) Factor C via QR (Done Only Once) (2) Then Do SVD on Penalized Columns (3) For Given α Find (Lower Dimensional) QR and Exploit Banded Matrix Structure. Solve Original Penalized Least Squares Problem with CHEAP QR (4) Repeat (3) and Choose α S Banded “R” √αI Like Matrix µ ¶

Chris Calderon, PASI, Lecture 3 QR and Least Squares

2 Efficient QR Treatment? C y β . ° √αD − 0 ° ° µ ¶ µ ¶ °2 ° ° ° ° ° ° Exploit P-Spline Problem Structure Minimizing Vector Equivalent to Solution of Some in Statistics use QR, (CT C + αDT D)β=CT y but Combine Penalized AT Aβ = AT (yT , 0)T Regression with SVD C Truncation A ≡ √αD µ ¶ Chris Calderon, PASI, Lecture 3 PSQR (Factor Steps)

1. Obtain the QR decomposition of C = QR.

RF R 2. Partition result above as: QR =(QF ,QP ) 11 12 . 0 RP µ 22¶ P T 3. Obtain the SVD of R22 = USV .

4. Form the following: I 0 Q˜ =(QF ,QP U), V˜ = , 0 V T µ ¶ RF R V R˜ R˜ R˜ = 11 12 11 12 , 0 S ≡ 0 R˜22 µ ¶ µ ¶ bF Q˜T y b = bP . 0 ≡ ⎡µ ¶⎤ µ ¶ 0 ⎣ ⎦ Chris Calderon, PASI, Lecture 3 PSQR (Solution Steps)

S 5. For each given α (and/or DP )form:D = √αDP V and W = . α α D µ α¶ 6. Obtain the QR decomposition Wα = Q0R0. P e f 1 T b e 7. Form c =(R0)− (Q0) . 0 f µ ¶ R˜ 1(bF R˜ c) 8. Solve βˆa = 11− 12 . α Vc− µ ¶

Chris Calderon, PASI, Lecture 3 PSQR (Efficiency)

SV T S R˜ = V T 22 V T , DP DP V ≡ DP V µ ¶ µ ¶ µ ¶

So if D = diag(0,...,0, 1,...1) DP = diag(1,...1) Then problem reduces to finding x minimizing

2 2 S bP I,0 S bP x = x . ° √αV − 0 ° ° 0,V √αI − 0 ° ° µ ¶ µ ¶ °2 ° µ ¶µ ¶ µ ¶ °2 ° ° ° ° ° ° ° Close to “R” ° ° Orthogonal° °Matrix ° Chris Calderon, PASI, Lecture 3 PSQR and Givens Rotations DP = diag(1,...1) 2 2 S bP I,0 S bP x = x . ° √αV − 0 ° ° 0,V √αI − 0 ° ° µ ¶ µ ¶ °2 ° µ ¶µ ¶ µ ¶ °2 ° ° ° ° ° ° ° Close to “R” ° ° Orthogonal° ° Matrix °

Givens Rotations to Finalize QR Applying Rotations to RHS Yields:

1 P (√Λ)− Sb

Where R = √Λ

Chris Calderon, PASI, Lecture 3 Subtle Errors

Chris Calderon, PASI, Lecture 3 PuDI Design Matrix (One Curve at a Time)

Sparsity Pattern of TPF Basis QR Needed in Step 1 (Before Penalty Added)

Chris Calderon, PASI, Lecture 3 TPF: Free and Penalized Blocks

p p p 1 x1 ...x (κ1 x1) ...(κK x1) 1 − + − + . . . ⎛. . . ⎞ p p p PuDI 1 xm ...xm (κ1 xm)+ ...(κK xm)+ C := ⎜ p 1 − p 1 − p 1 ⎟ . ⎜01...px − p(κ1 x1) − ...p(κK x1) − ⎟ ⎜ 1 − + − + ⎟ ⎜. . . ⎟ ⎜. . . ⎟ ⎜ p 1 p 1 p 1⎟ ⎜01...px p(κ x ) − ...p(κ x ) − ⎟ ⎜ m− 1 m + K m + ⎟ ⎝ − − ⎠

Last K Columns

Chris Calderon, PASI, Lecture 3 PuDI Design Matrix (Batches of Curves)

Sparsity Pattern of TPF Basis QR Needed in Step 1 (Before Penalty Added)

Chris Calderon, PASI, Lecture 3 Screened Spectrum

Chris Calderon, PASI, Lecture 3 Simple Basis Readily Allows For Useful Extensions

(e.g. Generalized Least Squares)

Chris Calderon, PASI, Lecture 3 Function of Interest (the “truth”) Noisy Point Estimates (finite discrete time series sample uncertainty)

2 2

1.5 1.5 o 1 Z 1

0.5 0.5 Z (State Space) Z (State ² ² 0 i 0 2 1.5 1 0.5 0 3 2.5 2 i1.5 1 0.5 0 σ(Z) ∂σ(Z)/∂Z Spatial Derivative of Function of Interest

Chris Calderon, PASI, Lecture 3 Point Estimates Noise Distribution Depends on Window and Function Estimated (Quantifying and Balancing These Can Be Important); Especially for Resolving “Spiky” Features.

2 2

1.5 1.5 o 1 Z 1

0.5 0.5 Z (State Space) Z (State ² ² 0 i 0 2 1.5 1 0.5 0 3 2.5 2 i1.5 1 0.5 0 σ(Z) ∂σ(Z)/∂Z Spatial Derivative of Function of Interest

Chris Calderon, PASI, Lecture 3