<<

arXiv:2007.05737v2 [math.ST] 19 Aug 2021 eea ocpswr nrdcd h otcmo wt inc (with common g most is the variables random introduced, of were dependence concepts measure several to idea different E A the for theorems limit central uniform and approximations fcmoiefntoas h hoyfridpnetvaria independent for theory The functionals. uniform prove to composite tool of powerful a is theory process Empirical Introduction 1 φ ehius f [ (no functions cf. of techniques, classes general for theorems limit central [ H¨older of by functio of existence introduced covariances was for concept bounds using abstract [ by instance, EDF For the discussed: of t were of techniques analysis more the much on functi Focusing Lipschitz of inequalities. covariances Bernstein-type of size [ the measures in (which derived under was processes means stationary for of theory process proc empirical linear An discuss not can instance, for but chains, Markov like [ assumptions [ large (cf. (cf. and schemes regenerative theory on process based chains empirical ap Markov various well-developed structure, a dependence exists with variables random For [ or mxn fra vriwaotmxn offiins f [ cf. coefficients, mixing about overview an (for -mixing 43 eso h plcblt forter nsm xmls o instance for examples, stat some locally in with theory regression nonparametric our for of rates expect applicability convergence for the cent inequalities functional show maximal a We Moreover, is result measur main processes. to Our possibility stationary r dependence. common known time another extend additional by for results sequences Our mixing and measure. chains dependence functional the o noeve)bsdo h rgnlieso [ of ideas original the on based overview) an for ] miia rcs hoyfrlclystationary locally for theory process Empirical epoieafaeokfreprclpoester flclystat locally of theory process empirical for framework a provide We ntttfu neadeMteai,I eehie ed25 U 205, Feld Neuenheimer Im Mathematik, f¨ur angewandte Institut [email protected] stefan.richter@i [email protected], m dpnetapoiain fteoiia bevtos T observations. original the of approximations -dependent 38 27 β ,[ ], ) oqatf h pe fcnegnei aia inequali maximal in convergence of speed the quantify To ]). rcrec c.[ (cf. -recurrence 30 for ] 12 ahwtPadian tfnRichter Stefan Phandoidaen, Nathawut n ute icsinppr.I h aaimo ekdepe weak of paradigm the In papers. discussion further and ] α mxn,[ -mixing, 26 4 oapa nBernoulli in appear To i -iig(o ttoaymxn) hc moe the imposes which mixing), stationary (for S-mixing via ] )hv ob moe.Teter oesarc ls of class rich a covers theory The imposed. be to have ]) 3 ,[ ], processes uut2,2021 20, August 50 Abstract ,adscesvl endb [ by refined successively and ], 28 ,[ ], 20 41 nyEF eedrvdb sn coupling using by derived were EDF) only t hoe ]poieuiomconvergence uniform provide 4] Theorem , 13 eeprcldsrbto ucin(EDF), function distribution empirical he 15 n fterno aibe) [ variables), random the of ons ,[ ], ,[ ], ovrec ae n ekconvergence weak and rates convergence ihlvlasmtoso h moments the on assumptions high-level ) ag eito eut n uniform and results deviation Large ]). esses. lsi elsuid(f [ (cf. well-studied is bles 21 eito eut o Harris-recurrent for results deviation rahshv endsusd There discussed. been have proaches 17 DF. so h admvrals Another variables. random the of ns n [ and ] esn tegh being strength) reasing ,[ ], oaynoise. ionary vnb iigcecet.Here, coefficients. mixing by iven toso usaedeveloped. are sums of ations wr.uni-heidelberg.de slsfrsainr Markov stationary for esults 18 a ii hoe o locally for theorem limit ral ,[ ], 1 mn tes rgeometric or others) among ] niversit¨at Heidelberg eedneadallow and dependence e 37 oaypoessusing processes ionary n [ and ] epoieuniform provide we e hndrv strong derive then hey 14 ,[ ], 34 39 mn others. among ] is additional ties, ,[ ], 19 11 16 ,[ ], α ,[ ], -, derived ] 23 ndence β 9 ,[ ], (the ] and - 44 ] last two developed for EDFs only) and [40] for β-mixing, and [10], [6] for β- and φ-mixing. See also [2], [10] and [40] for comprehensive overviews.

In [10] it is argued that β-mixing is the weakest mixing assumption that allows for a “complete” empirical process theory which incorporates maximal inequalities and uniform central limit theo- rems. There exist explicit upper bounds for β-mixing coefficients for Markov chains (cf. [25]) and for so-called V-geometric mixing coefficients (cf. [32]). For several stationary models like linear processes (cf. [35] for α-mixing), ARMA (cf. [33]), nonlinear AR (cf. [26]) and GARCH processes (cf. [22]) there also exist upper bounds on mixing coefficients. A common assumption in these results is that the observed process or, more often, the innovations of the correspond- ing process, have a continuous distribution. This is a crucial assumption to handle the relatively complicated mixing coefficients defined over a supremum over two different sigma-algebras. A re- laxation of β-mixing coefficients was investigated by [11, Theorem 1] and is specifically designed for the analysis of the EDF. In opposite to sigma-algebras, these smaller coefficients are defined with conditional expectations of certain classes of functions and are easier to upper bound for a wide range of time series models.

During the last years, another measure for dependence, the so-called functional dependence mea- sure, became popular (cf. [46]), which uses a Bernoulli shift representation (see (1.1) below) and decomposition into martingales and m-dependent sequences. It has been shown in various appli- cations that the functional dependence measure allows, when combined with the rich theory of martingales, for sharp large deviation inequalities (cf. [49] or [51]). In [47] and [31], uniform central limit theorems for the EDF were derived for stationary and piecewise locally stationary processes.

Up to now, no general empirical process theory (allowing for general classes of functions) using the functional dependence measure is available. In this paper we will fill this gap and prove maximal inequalities and functional central limit theorems under functional dependence. Furthermore, we will draw connections and compare our results to already existing empirical process concepts for dependent data we mentioned above. While the empirical process theory for Markov chains and mixing cited above was developed for stationary processes, we will work in the framework of locally stationary processes and therefore automatically provide the first general empirical process theory in this setting ([7] investigated spectral empirical processes for linear processes, [31] proved a functional for a localized empirical distribution function). Locally stationary processes allow for a smooth change of the distribution over time but can locally approximated by stationary processes. Therefore, they provide more flexible time series models (cf. [8] for an introduction).

The functional dependence measure uses a representation of the given process as a Bernoulli ν shift process and quantifies dependence with a L -norm. More precisely, we assume that Xi = (Xij)j=1,...,d, i = 1, ..., n, is a d-dimensional process of the form X = J ( ), (1.1) i i,n Gi where i = σ(εi, εi 1, ...) is the sigma-algebra generated by εi, i Z, a sequence of i.i.d. random G ˜ − ∈ ˜ N variables in Rd (d, d˜ N), and some measurable function J : (Rd) 0 R, i = 1, ..., n, n N. ∈ i,n → ∈ For a real-valued random variable W and some ν > 0, we define W := E[ W ν]1/ν . If ε is an k kν | | k∗ Z (i k) independent copy of εk, independent of εi, i , we define i∗ − := (εi, ..., εi k+1, εi∗ k, εi k 1, ...) (i k) (i k) ∈ G − − − − and X∗ − := J ( ∗ − ). The uniform functional dependence measure is given by i i,n Gi X (i k) ∗ − δν (k)= sup sup Xij Xij ν. (1.2) i=1,...,n j=1,...,d −

2 X Graphically, δν measures the impact of ε0 in Xk. Although representation (1.1) appears to be rather restrictive, it does cover a large variety of processes. In [5] it was motivated that the set of all processes of the form Xi = J(εi, εi 1, ...) should be equal to the set of all stationary and ergodic − processes. We additionally allow J to vary with i and n to cover processes which change their stochastic behavior over time. This is exactly the form of the so-called locally stationary processes discussed in [8].

Since we are working in the time series context, many applications ask for functions f that not only depend on the actual observation of the process but on the whole (infinite) past Zi := (Xi, Xi 1, Xi 2, ...). In the course of this paper, we aim to derive asymptotic properties of the − − empirical process n 1 i i G (f) := f(Z , ) Ef(Z , ) , f , (1.3) n √n i n − i n ∈ F i=1 X  where N f : (Rd) 0 [0, 1] R measurable . F ⊂ { × → } Let H(ε, , ) denote the bracketing entropy, that is, the logarithm of the number of ε-brackets F k · k with respect to some distance that is necessary to cover (this is made precise at the end k · k F of this section). We will define a distance Vn which guarantees weak convergence of (1.3) if the corresponding bracketing entropy integral 1 H(ε, ,V )dε is finite. 0 F n The definition of the functional dependenceR measurep for locally stationary processes is similar to its stationary version and is easy to calculate for many time series models. It does not rely on the stationarity assumption but on the representation of the process as a Bernoulli shift. Therefore, many well-known upper bounds for stationary time series given in [48], including recursively defined models and linear models, directly carry over to the locally stationary case (1.2). It seems reasonable to use it as a starting point to generalize empirical process theory for stationary processes to the more general setting of local stationarity. While the other two paradigms mentioned above should also allow such a generalization in principle, there are open questions:

• The theory for Harris-recurrent relies on stationarity and intrinsically needs some knowledge about the whole time series due to the assumption of null-recurrence. There exist generalizations of local stationary Markov chains (cf. for instance [42]), but the corre- sponding recurrence properties and examples of locally stationary time series are not worked out yet. Furthermore, it is not directly clear how to deal with processes Zi which incorporate the infinite past of Xi, and linear processes can not be discussed easily. • Absolutely regular β-mixing is shown in most examples by assuming some continuity in the distribution or in the corresponding innovations. Especially for linear processes, the bounds are quite hard to obtain and seem not to be optimal. Moreover, there exist no “invariance i rules” which would directly allow to transfer the mixing properties of Xi to f(Zi, n ) which incorporates infinitely many lags of Xi. • Many of the more elaborated dependence concepts developed in e.g. [11], [6], [20], [4] are restricted (at least in their original formulation) to the discussion of the EDF or connected one-dimensional indexed function classes.

3 i Contrary to β-mixing, the functional dependence measure can easily deal with f(Zi, n ) by using H¨older-type assumptions on f. Furthermore, it easily can be calculated in many situations and is not restricted to noncontinuous distributions of Xi. Also, linear processes Xi are covered.

However, there are also some peculiarities using the functional dependence measure (1.2). While for Harris-recurrent Markov chains and β-mixing, the empirical process theory is independent of the function class considered, the situation for the functional dependence measure is more complicated. i X In order to quantify the dependence of f(Zi, n ) by δν , we have to impose smoothness conditions on f in direction of its first argument. The distance Vn therefore will not only change with the dependence structure of X, but also has to be “compatible” with the function class . The F smoothness condition on f also poses a challenging issue when considering chaining procedures where rare events are excluded by (non-smooth) indicator functions.

Our main contributions in this paper are the following:

• We derive maximal inequalities for G (f) for classes of functions , n F • a chaining device which preserves smoothness during the chaining procedure and

• conditions to ensure asymptotic tightness and functional convergence of G (f), f . n ∈ F

The paper is organized as follows. In Section 2, we present our main result Theorem 2.3, the functional central limit theorem under minimal moment conditions. As a special case, we derive a version for stationary processes. We give a discussion on the distance Vn and compare our result with the empirical process theory for β-mixing. Some Assumptions are postponed to Section 3, where a new multivariate central limit theorem for locally stationary processes is presented. In Section 4, we provide new maximal inequalities for G (f) for both finite and infinite . In Section 5, we n F apply our theory to prove uniform convergence rates and weak convergences of several estimators. The aim of the last section is to highlight the wide range of applicability of our theory and to provide the typical conditions which have to be imposed as well as some discussion. In Section 6, a conclusion is drawn. We illustrate the main steps of the proofs in the Appendix of the article but postpone all detailed proofs to the Supplementary Material.

We now introduce some basic notation. For a, b R, let a b := min a, b , a b := max a, b . For ∈ ∧ { } ∨ { } k N, ∈ H(k) := 1 log(k) (1.4) ∨ which naturally appears in large deviation inequalities. For a given finite class , let denote F |F| its cardinality. We use the abbreviation

H = H( ) = 1 log (1.5) |F| ∨ |F| if no confusion arises. For some distance , let N(ε, , ) denote the bracketing numbers, that k · k F k · k is, the smallest number of ε-brackets [lj, uj] := f : lj f uj (i.e. measurable functions N { ∈ F ≤ ≤ } l , u : (Rd) 0 [0, 1] R with u l ε for all j) to cover . Let H(ε, , ) := log N(ε, , ) j j × → k j − jk≤ F F k·k F k·k denote the bracketing entropy. For ν 1, let ≥ n 1 i 1/ν f := f Z , ν . k kν,n n i n ν i=1  X  

4 2 A new functional central limit theorem

Roughly speaking, a process X , i = 1, ..., n is called locally stationary if for each u [0, 1], there i ∈ exists a X˜ (u), i = 1, ..., n such that X X˜ (u) if u i is small (cf. [8]). i i ≈ i | − n | Typical estimators are of the form

n 1 i/n u i K − f¯(Z , ) nh h i n i=1 X  where K is a kernel function and h = hn > 0 is a bandwidth. Clearly, such a localization changes the convergence rate. To cover these cases, we suppose that any f has a representation ∈ F N f(z, u)= D (u) f¯(z, u), z (Rd) 0 , u [0, 1], (2.1) f,n · ∈ ∈ where f¯ is independent of n and Df,n(u) is independent of z. We put

¯ := f¯ : f . (2.2) F { ∈ F}

The function class ¯ is considered to consist of H¨older-continuous functions in direction of z. For F Rd s (0, 1], a sequence z = (zi)i N0 of elements of (equipped with the maximum norm ) and ∈ ∈ | · |∞ an absolutely summable sequence χ = (χi)i N0 of nonnegative real numbers, we set ∈ 1/s ∞ s z χ,s := χi zi | | | |∞  Xi=0  and z := z . | |χ | |χ,1 ¯ Definition 2.1. is called a (L ,s,R,C)-class if L = (L ,i)i N0 is a sequence of nonnegative F F N F F ∈ N real numbers, s (0, 1] and R : (Rd) 0 [0, 1] [0, ) satisfies for all u [0, 1], z, z (Rd) 0 , ∈ × → ∞ ∈ ′ ∈ f¯ ¯, ∈ F ¯ ¯ s f(z, u) f(z′, u) z z′ L ,s R(z, u)+ R(z′, u) . | − | ≤ | − | F · 2 Furthermore, C = (CR,C ¯) (0, ) satisfies sup f¯(0, u) C ¯, sup R(0 , u) CR. f ∈ ∞ u | |≤ f u | |≤

The basic assumption for our main result is the following compatibility condition on . F Assumption 2.2. ¯ is a (L ,s,R,C)-class. There exists p (1, ], CX > 0 such that F F ∈ ∞

sup R(Zi, u) 2p CR, sup Xij 2sp CX . (2.3) p 1 i,u k k ≤ i,j k k − ≤ Let D 0, ∆(k) 0 be such that for all k N , n ≥ ≥ ∈ 0 k n 1/2 X s 1 i 2 2dCR L ,j(δ 2sp (k j)) ∆(k), sup Df,n( ) Dn. · F p 1 − ≤ n n ≤ j=0 − f i=1 X ∈F  X 

While (2.3) summarizes moment assumptions on Xij which are balanced by p, the sequence ∆(k) i D reflects the intrinsic dependence of f(Zi, n ) and n measures the influence of the factor Df,n(u) to the convergence rate of Gn(f).

5 Based on Assumption 2.2, we define for f ∈ F ∞ V (f) := f + min f , D ∆(k) . (2.4) n k k2,n {k k2,n n } Xk=1 Clearly, V satisfies a triangle inequality. Therefore, V (f g) is a distance between f, g . We n n − ∈ F are now able to state our main result. The weak convergence takes place in the normed space

ℓ∞( )= G : R G := sup G(f) < , (2.5) F { F → | k k∞ f | | ∞} ∈F cf. [43] for a detailed discussion of this space. Theorem 2.3. Let satisfy Assumptions 2.2, 3.1, 3.2 and 3.3. Assume that F 1 sup H(ε, ,Vn)dε < . n N 0 F ∞ ∈ Z p Then it holds in ℓ ( ) that ∞ F G d G n(f) f (f) f , ∈F → ∈F where (G(f))f is a centered Gaussian process with covariances ∈F Cov(G(f), G(g)) = lim Cov(Gn(f), Gn(g)). n →∞

The proof of Theorem 2.3 consists of two ingredients, convergence of the finite-dimensional dis- tributions (cf. Theorem 3.4) and asymptotic tightness (cf. Corollary 4.5). The more challenging part is asymptotic tightness; its proof only relies on Assumption 2.2 and consists of a new maximal inequality presented in Theorem 4.1 which may be of independent interest. To ensure convergence of the finite-dimensional distributions, we have to formalize local stationarity (Assumption 3.1) and pose conditions in time direction on f¯(z, ) (cf. Assumption 3.2) and D ( ) (cf. Assumption 3.3) · f,n · which is done in Section 3. In particular, it is needed that Df,n(u) is properly normalized.

Let us note that in the case that Xi is stationary, f¯(z, u) = f¯(z) and Df,n(u) = 1, Assumptions 3.1, 3.2 and 3.3 are directly fulfilled. That is, in the stationary case, Assumption 2.2 is sufficient for Theorem 2.3. We formulate this finding as a simple corollary. Let n 1 G˜ (h) := h(X ) Eh(X ) , n √n i − i i=1 X  where X = J( ), i = 1, ..., n, is a stationary process and h are functions from i Gi h : Rd R measurable H ⊂ { → } with the property that for all x,y Rd, h(x) h(y) L x y s . ∈ | − |≤ H| − |∞ X s D Corollary 2.4. Suppose that X1 2s < . Let ∆(k) := 2dL δ2s(k) and n := 1. Assume that k k ∞ H 1 sup H(ε, ,Vn)dε < . (2.6) n N 0 H ∞ ∈ Z p Then it holds in ℓ ( ) that ∞ H G˜ d G˜ n(h) h (h) h ∈H → ∈H where (G˜ (h))h is a centered Gaussian process with covariances ∈H Cov(G˜ (h1), G˜ (h2)) = Cov(h1(X0), h2(Xk)). k Z X∈ 6 2.1 Form of Vn and discussion on ∆(k)

2.1.1 Form of Vn

Suppose that D (0, ) is independent of n N. Based on decay rates of ∆(k), simpler forms n ∈ ∞ ∈ of Vn can be derived and are given in Table 1. These results are elementary and are proved in Lemmas 7.11, 7.12 in the Supplementary Material.

∆(j)

cj α, α> 1,c> 0 cρj, ρ (0, 1), c> 0 − ∈ 1 1 V (f) f max f − α , 1 f max log( f − ), 1 n k k2,n {k k2,n } k k2,n { k k2,n } σ σ˜ 1 σ˜ 1 H(ε, ,V )dε ε− α H(ε, , )dε log(ε ) H(ε, , )dε 0 F n 0 F k · k2,n 0 − F k · k2,n R p R p R p Table 1: Equivalent expressions of Vn and the corresponding entropy integral under the condition that D (0, ) is independent of n. We omitted the lower and upper bound constants which are n ∈ ∞ only depending on c, ρ, α and D . Furthermore,σ ˜ =σ ˜(σ) fulfillsσ ˜ 0 for σ 0. n → → If f(Z , i ), i = 1, ..., n, are independent, we can choose ∆(k) = 0 for k 1 and thus V (f) is i n ≥ n proportional to f . We therefore exactly recover the case of independent variables with our k k2,n theory.

2.1.2 Discussion on ∆(k)

Assumption 2.2 asks ∆(k) to upper bound

k X s L ,j(δ 2sp (k j)) , F p 1 − − Xj=0 which is a convolution of the uniform H¨older constants L ,j of f and the dependence mea- X F ∈ F sure δ 2sp (k) of X. Therefore, the specific form of f has an impact on the dependence p 1 ∈ F − structure which is then introduced in Vn. This is contrary to other typical chaining approaches for Harris-recurrent Markov chains or β-mixing sequences where the dependence structure of Xi simply transfers to functions f(Xi) without further conditions.

Furthermore, contrary to other chaining approaches, we have to ask for the existence of moments of Xi in Assumption 2.2 even though Gn(f) only involves f(Xi). This is due to the linear nature of the functional dependence measure (1.2). If f is Lipschitz continuous with respect to its first argument (s = 1 in Assumption 2.2), we have to impose sup X < . However, these moment i,j k ij k2 ∞ assumptions can be relaxed at the cost of larger ∆(k) as follows. Let us consider the special case i i that f(Zi, n ) only depends on Xi, n , that is, f(z, u) = f(z0, u). If f is bounded and Lipschitz continuous with respect to its first argument with Lipschitz constant L, for any s (0, 1], ∈ 1 s s s f(z0, u) f(z0′ , u) min 2 f ,L z0 z0′ (2 f ) − L z0 z0′ . | − |≤ { k k∞ | − |} ≤ k k∞ | − |

7 X s Thus, ∆(k) can be chosen proportional to δ2s(k) . This means that we can reduce the moment assumption to sup X < at the cost of having a larger norm V . i,j k ijk2s ∞ n

2.2 Comparison to empirical process theory with β-mixing

In this section, we compare our functional central limit theorem for stationary processes from Corol- lary 2.4 under functional dependence with similar results obtained under β-mixing. Unfortunately, X we were not able to find a general setting under which the functional dependence measure δ2 can X be compared with the β-mixing coefficients β of Xi, i = 1, ..., n. However, in some special cases, both quantities can be upper bounded.

2.2.1 Upper bounds for dependence coefficients of linear processes

Consider the linear process ∞ Xi = akεi k, i = 1, ..., n, − Xk=0 with an absolutely summable sequence a , k N , and i.i.d. ε , k Z, with Eε = 0. Then it is k ∈ 0 k ∈ 1 immediate that δX (k) 2 a ε . 2 ≤ | k| · k 1k2 From [35] (cf. also [15], Section 2.3.1), we have the following result. If for some ν 1, ε < , ≥ k 1kν ∞ ε1 has a Lipschitz-continuous Lebesgue-density and the process Xi is invertible, then for some constant ζ > 0,

X ∞ 1 ∞ β (k) ζ A 1+ν L(A ) , ≤ · | m,ν | ∨ m,2 m=k m=k X  X  s α where A := ∞ a and L(u)= u(1 log(u) ). If a = O(k ) for some α> 1, m,s k=m | k| ∨ | | k − PX α Xp α+ 1+α +1 α+ 3 1/2 δ (k)= O(k− ), β (k)= O k− 1+ν (k− 2 log(k) ) . (2.7) 2 ∨ Note that even for this specific example, the calculation of the functional dependence measure is much easier and possible with much less assumptions. Moreover, bounds for βX (k) is typically larger X X than δ2 (k). The reason being the simple structure of δ2 compared to the much more involved formulation of dependence through sigma-algebras in the β-mixing coefficients. For recursively X defined processes with a finite number of lags, δ2 are typically upper bounded by geometric decaying coefficients (cf. [48], [8]); the same holds true for βX (k) under additional continuity assumptions (cf. [15], Section 2.4. or [27], [25] among others).

2.2.2 Entropy integral

In [14] (cf. also [10]), it was shown that if Xi, i = 1, ..., n, is stationary and β-mixing with coefficients β(k), k N0, then ∈ 1 H(ε, , 2,β)dε < (2.8) 0 H k · k ∞ Z q 8 implies weak convergence of (Gn(h))h in ℓ∞( ). Here, the 2,β-norm is defined as follows. If ∈H H k · k β 1 denotes the inverse cadlag of the decreasing function t β( t ) and Q the inverse cadlag of − 7→ ⌊ ⌋ h the tail function t P(h(X ) >t), then 7→ 1 1 1 2 h := β− (u)Q (u) du. k k2,β h Z0 Condition (2.8) was later relaxed in [40, Theorem 8.3]. It could be shown that if consists F of indicator functions of specific classes of sets (in particular, corresponds to the empirical F distribution function), weak convergence can be obtained under less restrictive conditions than (2.8). Since our theory does not directly allow us to analyze indicator functions because has to F bea(L ,s,R,C)-class, we do not discuss their generalization here in detail. F

In the special cases of polynomial and geometric decay, simple upper bounds for h 2,β are available b 1 k k (cf. [10]). If k∞=0 k − β(k) < for some b 1, then 2,β is upper bounded by 2b . b 1 ∞ ≥ k · k k · k − P 2b Generally speaking, (2.8) asks for b 1 moments of the process f(Xi) to exist while our condition in − (2.6) only asks for 2 moments of f(Xi) but allows for smaller function classes through the additional factors given in the entropy integral (cf. Table 1). In specific examples (cf. (2.7)) it may occur that the entropy integral (2.6) is finite while (2.8) is infinite due to missing summability of βX (k).

To give a precise comparison, consider the situation of linear processes from (2.7). If ν > 2α + 1, we can choose b = α 3 . Then, the two entropy integrals from Corollary 2.4 (left) and (2.8) read − 2 1 1 1 ε− α H(ε, , 2)dε vs. H(ε, , 4α 6 )dε. F k · k F k · k 2α−5 Z0 Z0 − p 5 q Here, the entropy integral for mixing only exists if α> 2 . The difference in the behavior is due to different bounds used for the variance of Gn(f).

2.3 Integration into other empirical process results for the empirical distribu- tion function of dependent data

While our approach does allow for a general empirical process theory for H¨older continuous func- tion classes, some more general dependence concepts already have been introduced (only) for the discussion of the empirical distribution function (EDF) based on the one-dimensional class

= x ½ x t : t R . We mention [6], [9], [4] and [20]. The conditions therein cover the case F { 7→ { ≤ } ∈ } where Xi = J( i) is a stationary Bernoulli shift with i = (εi, εi 1, ...) and dependence is measured G G − with the (stationary) functional dependence measure

X (i k) δ (k)= X X∗ − ν k i − i kν X and its summed up version, Dν(k) := j∞=k+1 δν (j). [6] introduces so-called ν-approximation coefficients a , k N, which can be viewed as another formulation of functional dependence. k ∈ P However, their final result [6, Theorem 5] for the convergence of the EDF is stated with summability conditions both on ak and absolutely regular mixing coefficients, we therefore do not discuss it in detail here. On the other hand, [9, Theorem 2.1] in combination with [11, Section 6.1] show convergence of the EDF if for some ν > 1 and γ > 0,

(1+γ)(ν+1) Dν (k)= O(k− ν ).

9 This is done by introducing simplified β-mixing coefficients which can then be upper bounded by Dν(k). By using independent approximations of the original process, [4, Theorem 1, Corollary 1] obtain convergence of the EDF if for some ν 1 and A> 4, (D (k)kA)ν k A, or equivalently, ≥ ν ≤ − A(ν+1) D (k) k− ν . ν ≤ [20] discusses convergence of the EDF under a general growth condition imposed on the moments of n h(X ) Eh(X ) where h , the set of all H¨older-continuous functions with exponent i=1{ i − i } ∈Hα α (0, 1]. Their condition is fulfilled if ∈P

∞ 2p 2 α k − D (k) < ν ∞ Xk=1 ν for some p> ν 1 , ν > 1. −

3 A general central limit theorem for locally stationary processes

In this section, we introduce the remaining assumptions needed in Theorem 2.3 which pose regular- ity conditions on the process X and the function class in time direction. They are used to derive i F a multivariate central limit theorem for (Gn(f1), ..., Gn(fk)) under minimal moment conditions in Theorem 3.4. Comparable results in different and more specific contexts were shown in [8] or [42].

We first formalize the property of X to be locally stationary (cf. [8]). We ask that for each u [0, 1] i ∈ there exists a stationary process X˜ (u), i = 1, ..., n, such that X X˜ (u) if u i is small. Recall i i ≈ i | − n | R( ),s,p from Assumption 2.2. · Assumption 3.1. For each u [0, 1], there exists a process X˜ (u) = J( , u), i Z, where J ∈ i Gi ∈ is a measurable function. Furthermore, there exists some C > 0, ς (0, 1] such that for every X ∈ i 1, ..., n , u , u [0, 1], ∈ { } 1 2 ∈

i ς ς Xi X˜i( ) 2sp CX n− , X˜i(u1) X˜i(u2) 2sp CX u1 u2 . p 1 p 1 k − n k − ≤ k − k − ≤ | − | ˜ ˜ ˜ ˜ For Zi(u) = (Xi(u), Xi 1(u), ...) it holds that supv,u R(Z0(v), u) 2p < . − k k ∞

The behavior of the functions f(z, u) = D (u) f¯(z, u) of the class in the direction of time f,n · F u [0, 1] is controlled by the following two continuity assumptions which state conditions on f¯(z, ) ∈ · and D ( ) separately. f,n · Assumption 3.2. There exists some ς (0, 1] such that for every f¯ ¯, ∈ ∈ F

f¯(Z˜0(v), u1) f¯(Z˜0(v), u2) sup | − ς | < . v,u1,u2 u1 u2 2 ∞ | − |

For f , let D := sup D ( i ). ∈ F f,n∞ i=1,...,n f,n n

10 D ( ) f,n · Assumption 3.3. For all f , the function D has bounded variation uniformly in n, and ∈ F f,n∞ n 1 i 2 Df,n∞ sup Df,n( ) < , 0. (3.1) n N n n ∞ √n → ∈ Xi=1 One of the two following cases hold.

(i) Case K = 1 (global): For all f, g , u E[E[f¯(Z˜ (u), u) ] E[¯g(Z˜ (u), u) ]] has ∈ F 7→ j1 |G0 · j2 |G0 bounded variation for all j , j N and the following limit exists: 1 2 ∈ 0 1 (1) Σ := lim Df,n(u)Dg,n(u) Cov(f¯(Z˜0(u), u), g¯(Z˜j(u), u))du. fg n · →∞ Z0 j Z X∈ (ii) Case K = 2 (local): There exists a sequence h > 0 and v [0, 1] such that suppD ( ) n ∈ f,n · ⊂ [v h , v + h ]. It holds that − n n 1/2 hn 0, sup(hn Df,n∞ ) < . → n N · ∞ ∈ The following limit exists for all f, g : ∈ F 1 (2) Σ := lim Df,n(u)Dg,n(u)du Cov(f¯(Z˜0(v), v), g¯(Z˜j(v), v)). fg n · →∞ Z0 j Z X∈

Assumption 3.3 looks rather technical. The first part including (3.1) guarantees the right normal- ization of D ( ). The second part ensures the convergence of the asymptotic variances Var(G (f)) f,n · n and covariances Cov(Gn(f), Gn(g)).

We obtain the following central limit theorem.

Theorem 3.4. Let satisfy Assumptions 2.2, 3.1, 3.2 and 3.3. Let m N and f1, ..., fm . K F ∈ ∈ F Let Σ(K) = (Σ( ) ) . Then fkfl k,l=1,...,m

i i f1(Zi, n ) f1(Zi, n ) n     1 K . E . d N(0, Σ( )). √n  .  −  .  → i=1     X n     o  i   i  fm(Zi, n ) fm(Zi, n )        

Theorem 3.4 generalizes the one-dimensional central limit theorem from [8]. We now comment on the assumptions.

Remark 3.5. Assumptions 3.1, 3.2 and 3.3 allow for very general structures of f . However, ∈ F in many special cases, a subset of them is automatically fulfilled:

• If Xi is stationary, then Assumption 2.2 already implies Assumption 3.1.

11 • If f¯(z, u)= f¯(z) does not depend on u, Assumption 3.2 is fulfilled.

Regarding Assumption 3.3 we have:

• If Df,n(u) = 1, Xi is stationary and f¯(z, u) = f¯(z), then Assumption 2.2 already implies (1) ¯ ¯ Assumption 3.3(i) with Σfg = j Z Cov(f(Z0), f(Zj)). ∈ • If Df,n(u) = 1 and AssumptionP2.2, 3.2 hold with s = ς = 1, then Assumption 3.3(i) holds (1) 1 ¯ ˜ ¯ ˜ with Σfg = 0 j Z Cov(f(Z0(u), u), f(Zj(u), u))du. ∈ R P 1 u v • If hn 0, nhn and Df,n(u) = K( − ) with some Lipschitz-continuous kernel → → ∞ √hn hn K : R R with support [ 1, 1] and fixed v (0, 1), then Assumption 3.3(ii) holds with → ⊂ − ∈ (2) 1 2 ¯ ˜ ¯ ˜ Σfg = 0 K(x) dx j Z Cov(f(Z0(v), v), f(Zj(v), v)). · ∈ R P 4 Maximal inequalities and asymptotic tightness under functional dependence

In this section, we provide the necessary ingredients for the proof of asymptotic tightness of Gn(f). We derive a new maximal inequality for finite under functional dependence in Theorem 4.1. We F then generalize this bound to arbitrary using chaining techniques in Section 4.2. F

4.1 Maximal inequalities

We first derive a maximal inequality which is a main ingredient for chaining devices but also is of independent interest. To state the result, let

∞ β(q) := ∆(k). Xj=q and define q∗(x) := min q N : β(q) q x . { ∈ ≤ · } D Set Dn∞(u) := supf Df,n(u) . For ν 2, choose ν,n∞ such that ∈F | | ≥ n 1/ν 1 i ν D∞( ) D∞ . (4.1) n n n ≤ ν,n  Xi=1  Put D = D . Recall that H = H( ) = 1 log as in (1.5). n∞ 2∞,n |F| ∨ |F| Theorem 4.1. Suppose that satisfies < and Assumption 2.2. Then there exists some F |F| ∞ universal constant c> 0 such that the following holds: If supf f M and supf Vn(f) σ, ∈F k k∞ ≤ ∈F ≤ then E G √ √ D qMH max n(f) c min σ H + H n∞β(q)+ (4.2) f ≤ · q 1,...,n · √n ∈F ∈{ } h i

12 and M√H MH E max Gn(f) 2c σ√H + q∗ . (4.3) f ≤ · √nD √n ∈F n∞   

Clearly, the second bound (4.3) is a corollary of (4.2) which balances the two terms which involve q. Values of q ( ) for the two prominent cases that ∆( ) is polynomial or exponential decaying can ∗ · · be found in Table 2. The proof of Theorem 4.1 relies on a decomposition of Gn(f) in i.i.d. parts and a residual term of martingale structure. Similar decompositions are also the core of empirical process results for Harris-recurrent Markov chains (cf. [29]) and mixing sequences (cf. [10]).

∆(j)

Cj α, α> 1 Cρj, ρ (0, 1) − ∈

1 1 q (x) max x− α , 1 max log(x ), 1 ∗ { } { − } α α 1 δ r(δ) min δ − , δ min 1 , δ { } { log(δ− ) } Table 2: Equivalent expressions of q ( ) and r( ) taken from Lemma 7.10 in Section 7.9. We omitted ∗ · · the lower and upper bound constants which are only depending on C, ρ, α. G D In the next subsections, we will prove asymptotic tightness for n(f) under the condition that n∞, D do not depend on n. However, uniform convergence rates of G (f) for finite (growing with n n F n) can be obtained without these conditions but with additional moment assumptions, which is done in the following Corollary 4.3. To incorporate the additional moment assumptions, we use a slightly stronger assumption than Assumption 2.2.

Assumption 4.2. ¯ is a (L ,s,R,C)-class. There exists ν 2, p (1, ], CX > 0 such that F F ≥ ∈ ∞ sup R(Zi, u) νp CR, sup Xij νsp CX . (4.4) p 1 i,u k k ≤ i,j k k − ≤ Let D 0, ∆(k) 0 be such that for all k N , n ≥ ≥ ∈ 0 k n 1/2 X s 1 i 2 2dCR L ,j(δ νsp (k j)) ∆(k), sup Df,n( ) Dn. · F p 1 − ≤ n n ≤ j=0 − f i=1 X ∈F  X 

Note that Assumption 2.2 is obtained by taking ν = 2. For δ > 0, let

r(δ) := max r> 0 : q∗(r)r δ , { ≤ } cf. Table 2 for values of r( ) in special cases. · Corollary 4.3 (Uniform convergence rates). Suppose that satisfies < and Assumption s F |F| ∞ 4.2 for some ν 2. Let C∆ := 4d L 1 CX CR + Cf¯. Furthermore, suppose that ≥ · | F | · D ν,n∞ C∆H sup sup Vn(f) < , sup < , sup 2 < . (4.5) N N D N 1 σ 2 n f ∞ n n∞ ∞ n n − ν r( D ) ∞ ∈ ∈F ∈ ∈ n∞ Then, max Gn(f) = Op(√H). f | | ∈F

13 The first condition in (4.5) guarantees that Gn(f) is properly normalized. The second and third condition are needed to prove that the “rare events”, where f(Z , i ) exceeds some threshold | i n | Mn (0, ), are of the same order as √H. For this, we may need more than two moments of ∈i ∞ √ D f(Zi, n ), that is, ν > 2, depending on H and the behavior of n∞.

Corollary 4.3 can be used to prove (optimal) convergence rates for kernel density and regression estimators as well as maximum likelihood estimators under dependence. We give an example in Section 5.

4.2 Asymptotic tightness

In this section, we extend the maximal inequality from Theorem 4.1 to arbitrary (infinite) classes . Since Assumption 2.2 forces f to be H¨older-continuous with respect to its first argument F ∈ F z, classical chaining approaches which use indicator functions do not apply here. We provide a new chaining technique which preserves continuity in Section 7.2.

For n N, δ > 0 and k N define H(k) = 1 log(k) and ∈ ∈ ∨ δ D n1/2 m(n, δ, k) := r( ) n∞ . (4.6) 1/2 Dn · H(k)

Here, m(n, δ, k) represents the threshold for rare events in the chaining procedure. We have the following result. Theorem 4.4. Let satisfy Assumption 2.2 and let F be some envelope function of , that is, F F for each f it holds that f F . Let σ > 0 and assume that supf Vn(f) σ. Then there ∈ F | | ≤ ∈F ≤ exists some universal constant c>˜ 0 such that

E sup Gn(f) f ∈F D D σ n∞ n

H ½ c˜ (1 + + ) ε, ,Vn dε + √n F F> 1 m(n,σ,N( σ , ,V )) . ≤ D D F 4 2 n 1,n n n∞ Z0 { F } h q  i

As a corollary, we obtain asymptotic equicontinuity of Gn(f). Here, we use Assumptions 3.1 and 3.2 only to discuss the remainder term in Theorem 4.4 without imposing the existence of additional moments. Corollary 4.5. Let satisfy Assumption 2.2, 3.1 and 3.2. Suppose that F 1 sup H(ε, ,Vn)dε < . (4.7) n N 0 F ∞ ∈ Z p Furthermore, assume that D , D (0, ) are independent of n, and n n∞ ∈ ∞ i D∞( ) sup n n 0. (4.8) i=1,...,n √n →

Then, the process Gn(f) is equicontinuous with respect to Vn, that is, for every η > 0,

lim lim sup P sup Gn(f) Gn(g) η = 0. σ 0 n f,g ,Vn(f g) σ | − |≥ → →∞  ∈F − ≤  14 5 Applications

In this section, we provide some applications of the main results (Corollary 4.3 and Corollary 2.3). We will focus on locally stationary processes and therefore use localization in our functionals, but the results also hold for stationary processes, accordingly.

Let K : R R be some bounded kernel function which is Lipschitz continuous with Lipschitz → 2 1 1 constant LK , K(u)du = 1, K(u) du (0, ) and support [ 2 , 2 ]. For some bandwidth 1 ∈ ∞ ⊂ − h> 0, put K ( ) := K( · ). h R· h h R In the first example we consider the nonparametric kernel estimator in the context of nonparametric regression with fixed design and locally stationary noise. We show that under conditions on the bandwidth h, which are common in the presence of dependence (cf. [24] or [45]), we obtain the log(n) optimal uniform convergence rate nh . Write an & bn for sequences an, bn if there exists some constant c> 0 such that a cb for all n N. n ≥ n q ∈ Example 5.1 (Nonparametric Regression). Let Xi be some arbitrary process of the form (1.1) X with ∞ δ (k) < which fulfills sup X C (0, ) for some ν > 2. Suppose that k=0 2 ∞ i=1,...,n k ikν ≤ X ∈ ∞ we observe Yi, i = 1, ..., n given by P i Y = g( )+ X , i n i where g : [0, 1] R is some function. Estimation of g is performed via → n 1 i gˆ (v) := K ( v)Y . n,h n h n − i Xi=1 Suppose that either

α 1 X α log(n) − • & α δ2 (j) κj− with some κ> 0,α> 1, and h ( 1 2 ) , or ≤ n − ν 3 • X j & log(n) δ2 (j) κρ with some κ> 0, ρ (0, 1) and h 1 2 . ≤ ∈ n − ν

From (5.1) and (5.2) below it follows that

log(n) sup gˆn,h(v) Egˆn,h(v) = Op . v [0,1] | − | r nh ∈  First note that due to Lipschitz continuity of K with Lipschitz constant LK , we have

sup (ˆgn,h(v) Egˆn,h(v)) (ˆgn,h(v′) Egˆn,h(v′)) v v n 3 − − − | − ′|≤ − 3 n LKn− 1 X + E X = O (n− ). (5.1) ≤ · nh2 | i| | i| p i=1 X  For the grid V = in 3, i = 1, ..., n3 , which discretizes [0, 1] up to distances n 3, we obtain by n { − } − Corollary 4.3 that

1/2 √nh sup gˆn,h(v) Egˆn,h(v) = sup Gn(f) = Op log Vn = Op log(n) , (5.2) v Vn | − | f | | | | ∈ ∈F p   15 where 1 u v = fv(x, u)= K( − )x : v Vn . F { √h h ∈ }

The conditions of Corollary 4.3 are easily verified: It holds that fv(x, u) = Df,n(u) f¯v(x, u) with 1 u v · X D (u) = K( − ) and f¯ (x, u) = x. Thus, Assumption 4.2 is satisfied with ∆(k) = 2δ (k), f,n √h h v 2 K p = , R( )= CR = 1. Furthermore, Dn = K , Dν,n = | |∞ , and ∞ · | |∞ √h n 1/2 1 1 v u 2 2 fv 2,n K( − ) Xi 2 CX K , k k ≤ √h n h k k ≤ | |∞  Xi=1  which shows that supf Vn(f) = O(1). The conditions on h emerge from the last condition in (4.5) and using the bounds∈F for r( ) from Table 2. ·

For the following two examples we suppose that the underlying process Xi is locally stationary in the sense of Assumption 3.1. Similar assumptions are posed in [8] and are fulfilled for a large variety of locally stationary processes.

In the same spirit as in Example 5.1, it is possible to derive uniform rates of convergence for M- estimators of parameters θ in models of locally stationary processes. Furthermore, weak Bahadur representations can be obtained. The following results apply for instance to maximum likelihood estimation of parameters in tvARMA or tvGARCH processes. The main tool is to prove uniform convergence of the corresponding objective functions and its derivatives. Since the rest of the proof is standard, the details are postponed to the Supplementary Material, Section 7.8. Let j denote ∇θ the j-th derivative with respect to θ. To apply empirical process theory, we ask for the objective functions to be (L , 1,R,C)-classes in (A1) and Lipschitz with respect to θ in (A2). F Lemma 5.2 (M-estimation, uniform results). Let Θ RdΘ be compact and θ : [0, 1] interior(Θ). ⊂ 0 → For each θ Θ, let ℓ : Rk R be some measurable function which is twice continuously differen- ∈ θ → tiable. Let Zi = (Xi, ..., Xi k+1), and define for v [0, 1], − ∈ n 1 i θˆn,h(v) := arg min Ln,h(v,θ), Ln,h(v,θ) := Kh v ℓθ(Zi) θ Θ n n − · ∈ i=k X  Suppose that there exists C > 0 such that for j 0, 1, 2 , Θ ∈ { }

¯ j M 1 (A1) j = θℓθ : θ Θ is an (L , 1,R,C)-class with R(z)=1+ z 1 − for some M 1 and F {∇ ∈ }¯ F M | | ≥ Assumption 3.1 for j is fulfilled with s = 1, p = M 1 . F − (A2) for all z Rk, θ,θ Θ, ∈ ′ ∈ j j M θℓθ(z) θℓθ′ (z) CΘ(1 + z 1 ) θ θ′ 2, ∇ −∇ ∞ ≤ | | · | − |

(A3) θ Eℓ (Z˜ (v)) attains its global minimum in θ (v) with positive definite I(v) := E 2ℓ (Z˜ (v)). 7→ θ 0 0 ∇θ θ 0

Furthermore, suppose that either

α 1 X α log(n) − • & α δ2M (j) κj− with some κ> 0,α> 1, and h ( 1 2 ) , or ≤ n − ν

16 3 • X j & log(n) δ2M (j) κρ with some κ> 0, ρ (0, 1) and h 1 2 . ≤ ∈ n − ν

log(n) E ς Define τn := nh and Bh := supv [0,1] θLn,h(v,θ0(v)) (the bias). Then, Bh = O(h ), and ∈ | ∇ | as nh , →∞ q sup θˆn,h(v) θ0(v) = Op τn + Bh v [ h ,1 h ] − ∈ 2 − 2  and 1 ς sup θˆn,h(v) θ0(v) I(v)− θLn,h(v,θ0(v)) = Op((τn + h )(τn + Bh)). v [ h ,1 h ] { − }− ∇ ∈ 2 − 2

Remark 5.3. • In the tvAR(1) case Xi = a(i/n)Xi 1 + εi, we can use for instance − ℓ (x ,x ) = (x ax )2, θ 1 0 1 − 0 which for a ( 1, 1) is a ((1, a), 1, x + x , (0, 1))-class. ∈ − | 0| | 1| • With more smoothness assumptions on θℓ or using a local linear estimation method for θˆn,h, ∇ 2 the bias term Bh can be shown to be of smaller order, for instance O(h ) (cf. [8]). • The theory derived in this paper can also be used to prove asymptotic properties of M- estimators based on objective functions ℓθ which are only almost everywhere differentiable in the Lebesgue sense by following the theory of chapter 5 in [43]. This is of utmost interest for ℓθ that have additional analytic properties, such as convexity. Since these properties are also needed in the proofs, we will not discuss this in detail.

We give an easy application of the functional central limit theorem from Theorem 2.3 by inspecting a local stationary version of Example 19.25 in [43]. Example 5.4 (Local mean absolute deviation). For fixed v (0, 1), put X (v) := 1 K i v X ∈ n n h n − i and define the mean absolute deviation  n 1 i mad (v) := K v X X (v) . n n h n − | i − n | i=1 X  Let Assumption 3.1 hold with s = 1, p = . Suppose that P(X˜ (v) = EX˜ (v)) = 0 and that for ∞ 0 0 some κ> 0,α> 1, δX (j) κj α. We show that if nh and nh1+2ς 0, 2 ≤ − →∞ → √nh mad (v) E X˜ (v) µ d N(0,σ2), (5.3) n − | 0 − | → where µ = EX˜0(v), G denotes the distribution function of X˜0(v) and

∞ σ2 = K(u)2du Cov X˜ (v) µ + (2G(µ) 1)X˜ (v), · | 0 − | − 0 Z j=0 X X˜ (v) µ + (2G(µ) 1)X˜ (v) . | j − | − j The result is obtained by using the decomposition 

√nh madn(v) E X˜0(v) µ = Gn(f fµ)+ Gn(fµ)+ An, − | − | Xn(v) − n √nh i An = Kh v E Xi θ E X˜0(v) µ , n n − { | − |− | − | θ=Xn(v) i=1 X 

17 where Θ = θ R : θ µ 1 and { ∈ | − |≤ } = f (x, u)= √hK (u v) x θ : θ Θ . F { θ h − | − | ∈ }

By the triangle inequality, satisfies Assumption 2.2 with f¯θ(x, u) = x θ , R( ) = CR = 1, F X | − |¯ · p = , s = 1 and ∆(k) = 2δ2 (k). Assumption 3.2 is trivially fulfilled since f does not depend ∞ H 1 on u. Since is a one-dimensional Lipschitz class, supn N (ε, , 2,n) = O(log(ε− 1)). By F ∈ F k · k ∨ Corollary 2.3, we obtain that there exists some process [G(fθ)]θ Θ such that for h 0, nh , ∈ → →∞ G d G n(fθ) θ Θ (fθ) θ Θ in ℓ∞(Θ). (5.4) ∈ → ∈ Furthermore, by Assumption 3.1,   

f (Xi) fµ(Xi) 2 k Xn(v) − k X (v) µ X (v) EX (v) + EX (v) µ ≤ k n − k2 ≤ k n − n k2 k n − k2 n i n 1 1 v 1/2 ∞ 1 i K n − 2 δX (j)+ K ( v) EX EX˜ (v) ≤ √ nh h 2 n h n − i − 0 | nh i=1 j=0 i=1  X   X X 1/2 ς = O((nh)− + h ). (5.5)

By Lemma 19.24 in [43], we conclude from (5.4) and (5.5) that

p Gn(f fµ) 0. (5.6) Xn(v) − →

By Assumption 3.1 and bounded variation of K,

1/2 1/2 ς An = √nh E X˜0(v) θ E X˜0(v) µ + Op((nh)− + (nh) h− ). (5.7) | − | θ=Xn(v) − | − |  Due to P(X˜ (v)= µ) = 0, g(θ)= E X˜ (v) θ is differentiable in θ = µ with derivative 2G(µ) 1. 0 | 0 − | − The Delta method delivers

√nh E X˜0(v) θ E X˜0(v) µ | − | θ=Xn(v) − | − | = (2G(µ) 1)√nh(X (v) µ)+ o (1). (5.8) − n − p From (5.6), (5.7) and (5.8) we obtain

√nh mad (v) E X˜ (v) µ = G (f + (2G(µ) 1)id) + o (1). n − | 0 − | n µ − p Theorem 3.4 now yields (5.3). 

6 Conclusion

In this paper, we have developed a new empirical process theory for locally stationary processes with the functional dependence measure. We have proven a functional central limit theorem and maximal inequalities. A general empirical process theory for locally stationary processes is a key 2 step to derive asymptotic and nonasymptotic results for M-estimates or testing based on L - or L∞- . We have given an example in nonparametric estimation where our theory is applicable.

18 Due to the possibility to analyze the size of the function class and the stochastic properties of the underlying process separately, we conjecture that our theory also permits an extension of various results from i.i.d. to dependent data, such as empirical risk minimization.

From a technical point of view, the linear and moment-based nature of the functional dependence measure has forced us to modify several approaches from empirical process theory for i.i.d. or mixing variables. A main issue was given by the fact that the dependence measure only transfers decay rates for continuous functions. We therefore have provided a new chaining technique which preserves continuity of the arguments of the empirical process.

In principle, a similar empirical process theory for locally stationary processes can be established under mixing conditions such as absolute regularity. This would be a generalization of the results found in [38] and [10]. As we have seen in Section 2.2, such a theory would pose additional moment i i conditions on f(Zi, n ). Contrary to that, our framework only requires second moments of f(Zi, n ), but the entropy integral is enlarged by some factor which increases with stronger dependence. Moreover, in nearly all models the derivation of a bound for mixing coefficients needs continuity of the innovation process which may not be suitable in several examples. Therefore, we consider our theory as a valuable addition to this existing theory even in the stationary case.

One could also think of an extension of our empirical process theory to functions f(z, u) which are noncontinuous with respect to z. This can in principle be done by using a martingale decompo- i sition and assuming continuity of z E[f(Zi, ) i 1 = z], instead. However, one typically can 7→ n |G − only expect continuity of this functional if either already z f(z, u) was continuous or ε has a 7→ i continuous density. In the latter case, the sequence might also be β-mixing, and a more detailed discussion about advantages of our formulation is necessary.

Acknowledgements

The authors would like to thank the associate editor and two anonymous referees for their helpful remarks which helped to provide a much more concise version of the paper.

7 Appendix

In the appendix, we present the basic ideas used to prove the maximal inequalities of Section 4. We first consider the finite version in Section 7.1 and then present a chaining approach which preserves continuity in Section 7.2.

7.1 Proof idea: A maximal inequality for finite , Theorem 4.1 F

We provide an approach to obtain maximal inequalities for sums of random variables Wi(f), i = 1, ..., n, indexed by f , by using a decomposition into independent random variables. An ∈ F approach with similar intentions is presented in [10] (Section 4.3 therein) for absolutely regular

19 sequences and in [29] for Harris-recurrent Markov chains. For convenience, we abbreviate

i W (f)= f(Z , ) i i n n and put Sn(f) := i=1 Wi(f). P To approximate Wi(f) by independent variables, we use a technique from [49] which was refined in [51]. This decomposition is much more involved then the ones for Harris-recurrent Markov chains or mixing sequences since no direct coupling method is available. Define

Wi,j(f) := E[Wi(f) εi j, εi j+1, ..., εi], j N, | − − ∈ and n n S (f) := W (f) EW (f) , S (f) := W (f) EW (f) . n { i − i } n,j { i,j − i,j } Xi=1 Xi=1 Let q 1, ..., n be arbitrary. Put L := log(q) and τ := 2l (l = 0, ..., L 1), τ := q. Then we ∈ { } ⌊ log(2) ⌋ l − L have L

Wi(f)= Wi(f) Wi,q(f)+ (Wi,τl (f) Wi,τl 1 (f)) + Wi,1(f) − − − Xl=1 (in the case q = 1, the sum in the middle does not appear) and thus

L

Sn(f)= Sn(f) Sn,q(f) + Sn,τl (f) Sn,τl 1 (f) + Sn,1(f). − − − l=1   X   We write

n τ +1 (iτ ) n ⌊ l ⌋ l ∧ Sn,τl (f) Sn,τl 1 (f)= Ti,l(f), Ti,l(f) := Wk,τl (f) Wk,τl 1 (f) . − − − − i=1 k=(i 1)τ +1 X X− l   The random variables T (f), T (f) are independent if i i > 1. This leads to the decomposition i,l i′,l | − ′| 1 max Gn(f) max Sn(f) Sn,q(f) f ≤ f √n − ∈F ∈F n +1 n +1 L τ τ 1 ⌊ l ⌋ 1 1 ⌊ l ⌋ 1 + max Ti,l(f) + max Ti,l(f) f n √τl f n √τl l=1 ∈F i=1 ∈F i=1 h τl τl i X iXeven iXodd q q 1 W +max Sn,1(f) . (7.1) f √n ∈F

While the first term in (7.1) can be made small by assumptions on the dependence of Wi(f) and by the use of a large deviation inequality for martingales in Banach spaces from [36], the second and third term allow the application of Rosenthal-type bounds due to the independency of the summands Ti,l(f) and Wi,1(f), respectively. Since the first term in (7.1) allows for a stronger bound in terms of n than it is the case for mixing, we can obtain a theory which only needs second i moments of Wi(f)= f(Xi, n ). By Assumption 2.2, we can show the following results (cf. Lemma

20 7.3 in the Supplementary Material and recall (4.1) for the definition of Dn∞). For each i = 1, ..., n, j N, s N , f , ∈ ∈ ∪ {∞} ∈ F (i j) i sup Wi(f) Wi(f)∗ − Dn∞( )∆(j), (7.2) f − 2 ≤ n ∈F (i j ) i W (f) W (f)∗ − D ( ) ∆(j), (7.3) i − i 2 ≤ | f,n n | · i W (f) f(Z , ) . (7.4) i ks ≤ i n s

We now summarize the proof of Theorem 4.1 . The detailed proof is found in the Supplementary material. Denote n +1 L ⌊ τl ⌋ 1 W W 1 1 A1 := max Sn (f) Sn,q(f) , A2 := max Ti,l(f) . f √n − f n √τl ∈F l=1 ∈F i=1 h τl X iXeven q The remaining terms in (7.1) are discussed similarly or are special cases. We first have n ∞ 1 EA1 E max Ei,j(f) , ≤ √n f j=q ∈F i=1 X X where Ei,j(f) = E[Wi(f) εi j, ..., εi] E[Wi(f) εi j+1 , ..., εi] is a martingale difference sequence i | − − | − with respect to = σ(εi j , εi j+1, ...). For x = (xf )f and s := 2 log , define x s := s 1/s G − − ∈F ∨ |F| | | ( f xf ) . Then ∈F | | P n n n E max Ei,j E Ei,j Ei,j . f ≤ s ≤ s 2 ∈F i=1 i=1 i=1 X X X By Theorem 4.1 in [36] there exists an absolute constant c1 > 0 such that for s> 1, n n 2 1/2 Ei,j(f) c1√s Ei,j s . s 2 ≤ | | 2 i=1  i=1  X X By using (7.2) and standard techniques for the functional dependence measure, we conclude that

1/s i Ei,j s 2 s sup Ei,j(f) 2 eDn∞( )∆(j), | | k ≤ f | | ≤ n ∈F Summarizing the results, we obtain

EA 2ec √H D∞β(q). (7.5) 1 ≤ 1 · · n Regarding A2, we have n +1 L τ 1 ⌊ l ⌋ 1 EA2 E max Ti,l(f) . ≤ f n √τl l=1 ∈F i=1 h τl i X iXeven q Using the fact that martingale sequences are uncorrelated, (7.3) and the simple bound W (f) M, | i |≤ one can show that τ l j 1 Ti,l 2 min f 2,n, Dn∆( ) , Ti,l(f) 2√τlM. k k ≤ k k ⌊2⌋ √τl | |≤ j=τl 1+1 X−  21 A maximal inequality for independent random variables based on Bernstein’s inequality yields that there exists some universal constant c2 > 0 such that

n τ +1 1 ⌊ l ⌋ 1 E max Ti,l(f) f n √τ ∈F i=1 l τl iXeven q n +1 ⌊ τl ⌋ 2 1 1 1 1/2 supf τ Ti,l(f) H √ ∈F √ l | | c2 max n Ti,l(f) H + ∞ ≤ · f √τl n τl i=1 2 τl h  iXeven  i q τl j τlMH 4c2 min max f 2,n, Dn∆( ) √H + . ≤ { f k k ⌊2⌋ } · √n j=τl 1+1 ∈F h X−  i By monotonicity of the first term with respect to f and k k2,n L τ L l j min f , D ∆( ) 2V (f), τ 2q, {k k2,n n ⌊2⌋ }≤ n l ≤ l=1 j=τl 1+1 l=1 X X− X we obtain with some universal constant c3 > 0, qMH qMH EA2 c3 sup Vn(f)√H + c3 σ√H + , ≤ f √n ≤ √n  ∈F    which together with (7.5) provides the result of Theorem 4.1.

7.2 An elementary chaining approach which preserves continuity

In this section, we provide a chaining approach which preserves continuity of the functions inside the empirical process. Typical chaining approaches work with indicator functions which is not suitable for application of Theorem 4.1. We replace the indicator functions by suitably chosen truncations. For m > 0, define ϕ : R R and the corresponding “peaky” residual ϕ : R R m∧ → m∨ → via ϕ∧ (x) := (x ( m)) m, ϕ∨ (x) := x ϕ∧ (x). m ∨ − ∧ m − m

In the following, assume that for each j N there exists a decomposition = Nj , where ∈ 0 F k=1 Fjk ( ) , j N is a sequence of nested partitions. For each j N and k 1, ..., N , choose Fjk k=1,...,Nj ∈ 0 ∈ 0 ∈ {S j } a fixed element f . For j N , define π f := f if f . jk ∈ Fjk ∈ 0 j jk ∈ Fjk N Assume furthermore that there exists a sequence (∆jf)j N such that for all j 0, supf,g f ∈ ∈ ∈Fjk | − g ∆jf. Finally, let (mj)j N0 be a decreasing sequence which will serve as a truncation sequence. |≤ ∈ For j N , we use the decomposition ∈ 0

f π f = ϕ∧ (f π f)+ ϕ∨ (f π f) − j mj − j mj − j

22 Since

f π f = f π f + π f π f − j − j+1 j+1 − j = ϕ∧ (f π f)+ ϕ∨ (f π f) mj+1 − j+1 mj+1 − j+1 +ϕm∧ m (πj+1f πjf)+ ϕm∨ m (πj+1f πjf), (7.6) j − j+1 − j − j+1 − we can write

ϕm∧ (f πjf)= ϕm∧ (f πj+1f)+ ϕm∧ m (πj+1f πjf)+ R(j), (7.7) j − j+1 − j − j+1 − where

R(j) := ϕm∧ (f πjf) ϕm∧ (ϕm∧ (f πj+1f)) ϕm∧ (ϕm∧ m (πj+1f πjf)). j − − j j+1 − − j j − j+1 −

To bound R(j), we use (i) of the following elementary Lemma 7.1 which is proved in Section 7.6 included in the Supplementary Material.

Lemma 7.1. Let y,x,x1,x2,x3 and m,m′ > 0 be real numbers. Then the following assertions hold:

(i) If x + x m, then | 1| | 2|≤

ϕ∧ (x + x + x ) ϕ∧ (x ) ϕ∧ (x ) min x , 2m . m 1 2 3 − m 1 − m 2 ≤ {| 3| }

(ii) ϕ (x) min x ,m and if x

ϕm∨ (x) ϕm∨ (y) y ½ y>m . | |≤ ≤ { } (iii) If fulfills Assumption 4.2, then Assumption 4.2 also holds for ϕ (f) : f and F { m∧ ∈ F} ϕ (f) : f . { m∨ ∈ F}

Because the partitions are nested, we have π f π f ∆ f. By Lemma 7.1 and (7.6), we have | j+1 − j |≤ j

R(j) min ϕm∨ (f πj+1f)+ ϕm∨ m (πj+1f πjf) , 2mj | | ≤ j+1 − j − j+1 − min  ϕm∨ (∆j+1f) , 2mj + min ϕm∨ m (∆jf ) , 2m j . (7.8) ≤ j+1 j − j+1 Let τ N. We then have with iterated application of (7.7) and linearity of f W (f), ∈ 7→ i

G (ϕ∧ (f π f)) n m0 − 0 G G G = n(ϕm∧ 1 (f π1f)) + n(ϕm∧ 0 m1 (π1f π0f)) + n(R(0)) − − − τ 1 τ 1 G − G − G = n(ϕm∧ τ (f πτ f)) + n(ϕm∧ m (πj+1f πjf)) + n(R(j)), (7.9) − j − j+1 − Xj=0 Xj=0 which in combination with (7.8) can now be used for chaining. The following lemma provides the necessary balancing between the truncated versions of Gn(f) and the rare events excluded. Recall that H(k) = 1 log(k) as in (1.4). ∨

23 Lemma 7.2 (Compatibility lemma). If fulfills k and Assumption 4.2, then supf Vn(f) F |F| ≤ ∈F ≤ δ, supf f m(n, δ, k) imply ∈F k k∞ ≤ D n∞ E max Gn(f) c(1 + )δ H(k), (7.10) f ≤ D ∈F n p and supf Vn(f) δ implies that for each γ > 0, ∈F ≤

1 Dn

√ ½ n f f>γ m(n,δ,k) 1,n D δ H(k). (7.11) k { · }k ≤ γ n∞ p

Proof of Lemma 7.2. For q N, put β (q) := β(q) . By Theorem 4.1 and the definition of r( ), ∈ norm q · m(n, δ, k) H(k) m(n, δ, k)H(k) E max Gn(f) c δ H(k)+ q∗ f ≤ √nD √n ∈F pn∞  p    D δ δ = c δ H(k)+ n∞q∗(r( ))r( ) H(k) Dn Dn  p D p  = c(1 + n∞ )δ H(k). Dn p which shows (7.10). Since

i 1 i 2 1 i 2 f(Zi, )½ f(Z , i )>γm(n,δ,k) 1 f(Zi, ) 1 = f(Zi, ) 2, k n { i n }k ≤ γm(n, δ, k) k n k γm(n, δ, k) k n k for all f with V (f) δ, it holds that ∈ F n ≤ 2 √n 2 1 f 2,n √n f ½ f k k H(k). (7.12) f>γm(n,δ,k) 1,n 2,n D δ k { k ≤ γm(n, δ, k) k k ≤ γ n∞r( D ) n p

If f D ∆(1), we have k k2,n ≥ n

∞ V (f)= f + D ∆(j) f + D β(1). (7.13) n k k2,n n ≥ k k2,n n Xj=1 In the case f < D ∆(1), the fact that ∆( ) is decreasing implies that a = max j N : k k2,n n · ∗ { ∈ f < D ∆(j) is well-defined. We conclude that k k2,n n } a ∞ ∗ ∞ V (f) = f + f (D ∆(j)) = f + f + D ∆(j) n k k2,n k k2,n ∧ n k k2,n k k2,n n Xj=0 Xj=1 j=Xa∗+1 = f (a∗ +1)+ D β(a∗) f a∗ + β(a∗). (7.14) k k2,n n ≥ k k2,n Summarizing the results (7.13) and (7.14), we have

V (f) f (a∗ 1) + D β(a∗ 1). n ≥ k k2,n ∨ n ∨ We conclude that Vn(f) min f 2,na + Dnβ(a) f 2,naˆ + Dnβ(ˆa), ≥ a N k k ≥ k k ∈   24 wherea ˆ = argminj N f 2,n j + Dnβ(j) . ∈ k k ·  Since δ V (f), we have δ D β(ˆa) = D β (ˆa)ˆa. Thus β (ˆa) δ . By definition of n n n norm norm Dnaˆ δ≥ δ≥ δ δ δ ≤ δ q∗, q∗( D ) aˆ. Thus q∗( D ) D D . By definition of r( ), r( D ) D . We conclude with naˆ ≤ naˆ naˆ ≤ n · n ≥ naˆ f V (f) δ that k k2,n ≤ n ≤ 2 D 2 f 2,n naˆ f 2,n DnVn(f) f 2,n Dn Dn k k k k k k f 2,n δ. (7.15) D r( δ ) ≤ D δ ≤ D δ ≤ D k k ≤ D n∞ Dn n∞ n∞ n∞ n∞

Inserting the result into (7.12), we finally obtain that for all f with V (f) δ it holds that ∈ F n ≤ 2 D √n 2 1 f 2,n 1 n √n f ½ f k k H(k) δ H(k). f>γm(n,δ,k) 1,n 2,n D δ D k { k ≤ γm(n, δ, k) k k ≤ γ n∞r( D ) ≤ γ n∞ n p p which shows (7.11).

7.3 Proof idea: A maximal inequality for infinite , Theorem 4.4 F

The details of the proof are given in Section 7.5 in the Supplementary material. In the following, we abbreviate H(δ)= H(δ, ,V ) and N(δ)= N(δ, ,V ). Choose δ = σ and δ = 2 jδ . F n F n 0 j − 0 For each j N , we choose a covering by brackets := [l , u ] , k = 1, ..., N := N(δ ) such ∈ 0 Fjk jk jk ∩ F j j that Vn(ujk ljk) δj and supf,g f g ujk ljk =: ∆jk. We may assume w.l.o.g. that − ≤ ∈Fjk | − | ≤ − l , u , ∆ and that ( ) are nested. jk jk jk ∈ F Fjk k In each , fix some f , and define π f := f and ∆ f := ∆ . Put Fjk jk ∈ F j jk j jk σ I(σ) I(σ) := H(ε, ,Vn)dε, τ := min j 0 : δj 1. 0 F ≥ ≤ √n ∨ Z p n o 1 The chaining procedure is now applied with mj := 2 m(n, δj,Nj+1) (m( ) from (4.6)). Choose 1 · Mn = 2 m0. We then have

n 1

E G E G E ½ sup n(f) sup n(f) + Wi(F F>Mn ) , (7.16) ≤ √n { } f f (Mn) i=1 ∈F ∈F X   where (M ) := ϕ (f) : f . F n { M∧ n ∈ F} By (7.8), (7.9), G (f) G (g)+ 2 n W (g) G (g) + 2√n g for f g, we obtain | n |≤ n √n i=1 k i k1 ≤ n k k1,n | |≤ P

25 the decomposition

sup Gn(f) sup Gn(π0f) f (Mn) | | ≤ f | | ∈F ∈F G √ + sup n(ϕm∧ τ (∆τ f)) + 2 n sup ∆τ f 1,n f | | f k k n ∈F ∈F o τ 1 − G + sup n(ϕm∧ j mj+1 (πj+1f πjf)) f − − j=0 ∈F X τ 1 − G + sup n(min ϕm∨ j+1 (∆j+1f) , 2mj ) j=0 f X n ∈F 

½ +2√n sup ∆j+1f ∆j+1f>mj+1 1,n f k { }k ∈F o τ 1 − G + sup n(min ϕm∨ m (∆jf) , 2mj ) j − j+1 j=0 f X n ∈F 

½ +2√n sup ∆jf ∆jf>mj mj+1 1,n f k { − }k ∈F o =: R1 + R2 + R3 + R4 + R5. (7.17)

The terms Ri, i = 1, ..., 5 can now be discussed separately with Lemma 7.2 and give the upper bounds (with constants C > 0): τ ER Cδ H(δ ), ER 2I(σ)+ Cδ H(δ ), ER , ER , ER C δ H(δ ), 1 ≤ 0 1 2 ≤ τ τ+1 3 4 5 ≤ j j+1 j=0 p p X q E G and thus supf (Mn) n(f) C′I(σ). Together with (7.16), we result follows. ∈F | |≤

References

[1] Rados l aw Adamczak. A tail inequality for suprema of unbounded empirical processes with applications to Markov chains. Electron. J. Probab., 13:no. 34, 1000–1034, 2008. [2] Donald W. K. Andrews and David Pollard. An introduction to functional central limit theorems for dependent stochastic processes. International Statistical Review / Revue Internationale de Statistique, 62(1):119–132, 1994. [3] M. A. Arcones and B. Yu. Central limit theorems for empirical and U-processes of stationary mixing sequences. J. Theoret. Probab., 7(1):47–71, 1994. [4] Istv´an Berkes, Siegfried H¨ormann, and Johannes Schauer. Asymptotic results for the empirical process of stationary sequences. . Appl., 119(4):1298–1324, 2009. [5] Vivek S. Borkar. White-noise representations in stochastic realization theory. SIAM J. Control Optim., 31(5):1093–1102, 1993. [6] Svetlana Borovkova, Robert Burton, and Herold Dehling. Limit theorems for functionals of mixing processes with applications to U-statistics and dimension estimation. Trans. Amer. Math. Soc., 353(11):4261–4318, 2001.

26 [7] Rainer Dahlhaus and Wolfgang Polonik. Empirical spectral processes for locally stationary time series. Bernoulli, 15(1):1–39, 2009.

[8] Rainer Dahlhaus, Stefan Richter, and Wei Biao Wu. Towards a general theory for nonlinear locally stationary processes. Bernoulli, 25(2):1013–1044, 2019.

[9] J. Dedecker. An empirical central limit theorem for intermittent maps. Probab. Theory Related Fields, 148(1-2):177–195, 2010.

[10] J´erˆome Dedecker and Sana Louhichi. Maximal inequalities and empirical central limit theo- rems. In Empirical process techniques for dependent data, pages 137–159. Birkh¨auser Boston, Boston, MA, 2002.

[11] J´erˆome Dedecker and Cl´ementine Prieur. An empirical central limit theorem for dependent sequences. Stochastic Process. Appl., 117(1):121–142, 2007.

[12] Herold Dehling, Olivier Durieu, and Dalibor Volny. New techniques for empirical processes of dependent data. Stochastic Process. Appl., 119(10):3699–3718, 2009.

[13] Monroe D. Donsker. Justification and extension of Doob’s heuristic approach to the Komogorov-Smirnov theorems. Ann. Math. Statistics, 23:277–281, 1952.

[14] P. Doukhan, P. Massart, and E. Rio. Invariance principles for absolutely regular empirical processes. Ann. Inst. H. Poincar´eProbab. Statist., 31(2):393–427, 1995.

[15] Paul Doukhan. Mixing, volume 85 of Lecture Notes in Statistics. Springer-Verlag, New York, 1994. Properties and examples.

[16] Paul Doukhan and Michael H Neumann. Probability and moment inequalities for sums of weakly dependent random variables, with applications. Stochastic Processes and their Appli- cations, 117(7):878–903, 2007.

[17] R. M. Dudley. Weak convergences of probabilities on nonseparable metric spaces and empirical measures on Euclidean spaces. Illinois J. Math., 10:109–126, 1966.

[18] R. M. Dudley. Central limit theorems for empirical measures. Ann. Probab., 6(6):899–929 (1979), 1978.

[19] R. M. Dudley. Uniform central limit theorems, volume 142 of Cambridge Studies in Advanced Mathematics. Cambridge University Press, New York, second edition, 2014.

[20] Olivier Durieu and Marco Tusche. An empirical process central limit theorem for multidimen- sional dependent data. J. Theoret. Probab., 27(1):249–277, 2014.

[21] Richard S. Ellis and Aaron D. Wyner. Uniform large deviation property of the empirical process of a Markov chain. Ann. Probab., 17(3):1147–1151, 1989.

[22] Christian Francq and Jean-Michel Zako¨ıan. Mixing properties of a general class of GARCH(1,1) models without moment assumptions on the observed process. Econometric Theory, 22(5):815– 834, 2006.

[23] Evarist Gin´eand Richard Nickl. Mathematical foundations of infinite-dimensional statisti- cal models. Cambridge Series in Statistical and Probabilistic Mathematics, [40]. Cambridge University Press, New York, 2016.

27 [24] Bruce E. Hansen. Uniform convergence rates for kernel estimation with dependent data. Econometric Theory, 24(3):726–748, 2008.

[25] Lothar Heinrich. Bounds for the absolute regularity coefficient of a stationary renewal process. Yokohama Math. J., 40(1):25–33, 1992.

[26] Hans Arnfinn Karlsen and Dag Tjø stheim. Nonparametric estimation in null recurrent time series. Ann. Statist., 29(2):372–416, 2001.

[27] Rafa lKulik, Philippe Soulier, and Olivier Wintenberger. The tail empirical process of reg- ularly varying functions of geometrically ergodic Markov chains. Stochastic Process. Appl., 129(11):4209–4238, 2019.

[28] Shlomo Levental. Uniform limit theorems for Harris recurrent Markov chains. Probab. Theory Related Fields, 80(1):101–118, 1988.

[29] Degui Li, Dag Tjø stheim, and Jiti Gao. Estimation in nonlinear regression with Harris recurrent Markov chains. Ann. Statist., 44(5):1957–1987, 2016.

[30] Eckhard Liebscher. Strong convergence of sums of [alpha]-mixing random variables with appli- cations to density estimation. Stochastic Processes and their Applications, 65(1):69–80, 1996.

[31] Ulrike Mayer, Henryk Z¨ahle, and Zhou Zhou. Functional weak limit theorem for a local empirical process of non-stationary time series and its application. Bernoulli, 26(3):1891 – 1911, 2020.

[32] Sean Meyn and Richard L. Tweedie. Markov chains and stochastic stability. Cambridge University Press, Cambridge, second edition, 2009. With a prologue by Peter W. Glynn.

[33] Abdelkader Mokkadem. Mixing properties of ARMA processes. Stochastic Process. Appl., 29(2):309–315, 1988.

[34] Mina Ossiander. A central limit theorem under metric entropy with L2 bracketing. Ann. Probab., 15(3):897–919, 1987.

[35] Tuan D. Pham and Lanh T. Tran. Some mixing properties of time series models. Stochastic Process. Appl., 19(2):297–303, 1985.

[36] Iosif Pinelis. Optimum bounds for the distributions of martingales in Banach spaces. Ann. Probab., 22(4):1679–1706, 1994.

[37] David Pollard. A central limit theorem for empirical processes. J. Austral. Math. Soc. Ser. A, 33(2):235–248, 1982.

[38] Emmanuel Rio. The functional law of the iterated logarithm for stationary strongly mixing sequences. Ann. Probab., 23(3):1188–1203, 07 1995.

[39] Emmanuel Rio. Processus empiriques absolument r´eguliers et entropie universelle. Probab. Theory Related Fields, 111(4):585–608, 1998.

[40] Emmanuel Rio. Asymptotic theory of weakly dependent random processes, volume 80 of Prob- ability Theory and Stochastic Modelling. Springer, Berlin, 2017. Translated from the 2000 French edition [ MR2117923].

28 [41] Jorge D. Samur. A regularity condition and a limit theorem for Harris ergodic Markov chains. Stochastic Process. Appl., 111(2):207–235, 2004.

[42] Lionel Truquet. A perturbation analysis of Markov chains models with time-varying parame- ters. Bernoulli, 26(4):2876–2906, 2020.

[43] A. W. van der Vaart. Asymptotic statistics, volume 3 of Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, Cambridge, 1998.

[44] Aad W. van der Vaart and Jon A. Wellner. Weak convergence and empirical processes. Springer Series in Statistics. Springer-Verlag, New York, 1996. With applications to statistics.

[45] Michael Vogt. Nonparametric regression for locally stationary time series. Ann. Statist., 40(5):2601–2633, 2012.

[46] Wei Biao Wu. Nonlinear system theory: another look at dependence. Proc. Natl. Acad. Sci. USA, 102(40):14150–14154, 2005.

[47] Wei Biao Wu. Empirical processes of stationary sequences. Statistica Sinica, 18(1):313–333, 2008.

[48] Wei Biao Wu. Asymptotic theory for stationary processes. Stat. Interface, 4(2):207–226, 2011.

[49] Wei Biao Wu, Weidong Liu, and Han Xiao. Probability and moment inequalities under de- pendence. Statist. Sinica, 23(3):1257–1272, 2013.

[50] Bin Yu. Rates of convergence for empirical processes of stationary mixing sequences. Ann. Probab., 22(1):94–116, 1994.

[51] Danna Zhang and Wei Biao Wu. Gaussian approximation for high dimensional time series. Ann. Statist., 45(5):1895–1919, 2017.

29 Supplementary Material

This material contains some details of the proofs in the paper as well as the proofs of the examples.

7.4 Proofs of Section 4.1

Lemma 7.3. Let Assumption 4.2 hold for some ν 2. Then, ≥ δf(Z,u)(k) D (u) ∆(k), ν ≤ | f,n | · (i j) sup sup f(Zi, u) f(Zi∗ − , u) Dn∞(u) ∆(k), i f − ν ≤ · ∈F sup f(Zi, u ) ν Df,n(u) C∆, i k k ≤ | | ·

s where C∆ := 4d L 1 CX CR + Cf¯. · | F | ·

Proof of Lemma 7.3. We have for each f and ν 2 that ∈ F ≥ ¯ ¯ (i k) sup f(Zi, u) f(Zi∗ − , u) i − ν

(i k) s (i k) sup Z Z∗ − R(Z , u)+ R(Z∗ − , u) i i L ,s i i ≤ i | − | F ν

(i k) s (i k) sup Zi Z∗ − R(Zi, u)+ R(Z∗ − , u) i L ,s p i ≤ i − F p 1 ν pν −

∞ (i k) s (i k) sup L ,j Xi j Xi∗ j− R(Zi, u) pν + R(Zi∗ − , u) ≤ F − − − k k pν i j=0 ∞ p ν   X p 1 − k X s 2dCR L ,j(δ p (k j)) . p 1 νs ≤ F − − Xj=0 This shows the first assertion. Due to

(i k) (i k) s (i k) ¯ ¯ ∗ − ∗ − ∗ − sup f(Zi, u) f(Zi , u) Zi Zi L ,s R(Zi, u)+ R(Zi , u) , f − ≤ | − | F ∈F  the second assertion follows similarly. The last assertion follows from ¯ ¯ ¯ ¯ s ¯ f(z, u) f(z, u) f(0, u) + f(0, u) z L ,s (R(z, u)+ R(0, u)) + f(0, u) | | ≤ | − | | | ≤ | | F · | | which implies

∞ s f¯(Zi, u) ν L ,j Zi j R(Zi, u) + R(0, u) + f¯(0, u) k k ≤ F | − |∞ p ν pq | | j=0 p 1 X −  s ¯ 2d L 1 CX (C R + R (0, u) )+ f(0, u) F ≤ · | | · s · | | | | 4d L 1 CX CR + Cf¯. ≤ · | F | · ·

30 Proof of Theorem 4.1. Denote the three terms on the right hand side of (7.1) by A1, A2, A3. We now discuss the three terms separately. First, we have

n ∞ 1 EA1 E max (Wi,j+1(f) Wi,j(f)) . ≤ √n f − j=q ∈F i=1 X X For fixed j, the sequence

Ei,j := (Ei,j(f))f = (Wi,j+1(f) Wi,j(f)) ∈F − f E E ∈F = ( [Wi(f) εi j, ..., εi]  [Wi(f) εi j+1, ..., εi])f | − − | − ∈F i is a -dimensional martingale difference vector with respect to = σ(εi j, εi j+1, ...). For a |F| s 1/s G − − vector x = (xf )f and s 1, write x s := ( f xf ) . By Theorem 4.1 in [36] there exists ∈F ≥ | | | | an absolute constant c > 0 such that for s> 1, ∈F 1 P n n 1/2 E 2 i 1 Ei,j c1 2 sup Ei,j s + 2(s 1) [ Ei,j s − ] . (7.18) s 2 ≤ i=1,...,n | | 2 − | | |G 2 i=1 n  i=1  o X p X We have n 2 1/2 2 1/2 sup Ei,j s = sup Ei,j s Ei,j s , i=1,...,n | | 2 i=1,...,n | | 2 ≤ | | 2 i=1  X  therefore both terms in (7.18) are of the same order and it is enough to bound the second term in (7.18). We have

n n 1/2 1/2 E 2 i 1 E 2 i 1 [ Ei,j s − ] = [ Ei,j s − ] | | |G 2 | | |G 1  i=1  i=1 X Xn 1/2 2 i 1 E[ E − ] ≤ | i,j|s|G 1  i=1  Xn 1/2 E 2 . (7.19) ≤ | i,j|s 2 i=1  X 

Note that

Ei,j(f) = Wi,j+1(f) Wi,j(f)= E[Wi(f) εi j, ..., εi] E[Wi(f) εi j+1, ..., εi] − | − − | − (i j) (i j+1) = E[W (f)∗∗ − W (f)∗∗ − ], (7.20) i − i |Gi (i j) (i j) (i j) where H( i)∗∗ − := H( i∗∗ − ) and i∗∗ − = (εi, εi 1, ..., εi j , εi∗ j 1, εi∗ j 2, ...). F F F − − − − − − (i j) (i j+1) By Jensen’s inequality, Lemma 7.3 and the fact that (Wi(f)∗∗ − ,Wi(f)∗∗ − ) has the same

31 (i j) distribution as (Wi(f),Wi(f)∗ − ),

1/s s Ei,j s = Ei,j(f) k| | 2 | | | 2 f  X∈F  1/s ( i j) (i j+1) s sup E[Wi(f)∗∗ − Wi(f)∗∗ − i] ≤ f − |G 2 ∈F (i j) (i j+1) e E sup Wi(f)∗∗ − Wi(f)∗∗ − i ≤ · f − G 2 ∈F  (i j) (i j+1)  e sup W i(f)∗∗ − Wi(f)∗∗ − ≤ · f − 2 ∈F (i j) = e sup Wi(f) Wi(f)∗ − · f − 2 ∈F i e D∞( )∆(j). (7.21) ≤ · n n

Inserting (7.21) into (7.19) delivers

n n 1/2 1/2 2 i 2 E e D∞( ) ∆(j), | i,j|s 2 ≤ n n i=1 i=1  X   X 

Inserting this bound into (7.18), we obtain

n n 1/2 1/2 1/2 1 i 2 Ei,j 4ec1s n Dn∞( ) ∆(j). s 2 ≤ n n i=1  i=1  X X We conclude with s := 2 log that ∨ |F| n 1 ∞ EA1 Ei,j ≤ √n s 2 Xk=q Xi=1 n 1/2 1 i 2 ∞ 4ec 2 log D∞( ) ∆ (j) ≤ 1 · ∨ |F| · n n n p i=1 j=q p  X  X 8ec √H D∞β(q). (7.22) ≤ 1 · · n

We now discuss EA2. If MQ,σQ > 0 are constants and Qi(f), i = 1, ..., m mean-zero independent variables (depending on f ) with Q (f) M , ( 1 m Q (f) 2)1/2 σ , then there exists ∈ F | i |≤ Q m i=1 k i k2 ≤ Q some universal constant c > 0 such that 2 P m 1 MQH E max Qi(f) EQi(f) c2 σQ√H + , (7.23) f √m − ≤ · √m ∈F i=1   X   (see e.g. [10] equation (4.3) in Section 4.1 therein).

Note that (W W ) is a martingale difference sequence and W W = τl (W k,j k,j 1 k k,τl k,τl 1 j=τl 1+1 k,j − − − − − − Wk,j 1). Furthermore, we have − P

Wk,j Wk,j 1 2 Wk E[Wk εk j+1] 2 Wk 2 k − − k ≤ k − | − k ≤ k k

32 and

(k j+1) (k j+2) Wk,j Wk,j 1 2 = E[W ∗∗ − W ∗∗ − k] 2 k − − k k k − k |G k (k j+1) (k j+2) W ∗∗ − W ∗∗ − ≤ k k − k k2 (k j+1) W = W W ∗ − = δ k (j 1), k k − k k2 2 − thus

Wk Wk,j Wk,j 1 2 min Wk 2, δ2 (j 1) . k − − k ≤ {k k − } We conclude with the elementary inequality min a , b +min a , b min a + a , b + b that { 1 1} { 2 2}≤ { 1 2 1 2} (iτ ) n l ∧ Ti,l 2 = (Wk,τl Wk,τl 1 ) k k − − 2 k=(i 1)τl+1 X− τ (iτ ) n l l ∧ = (Wk,j Wk,j 1) − − 2 j=τl 1+1 k=(i 1)τl+1 X− X− τ (iτ ) n l l ∧ (Wk,j Wk,j 1) ≤ − − 2 j=τl 1+1 k=(i 1)τl+1 X− X− τ (iτ ) n l l ∧ 1/2 2 Wk,j Wk,j 1 2 ≤ k − − k j=τl 1+1 k=(i 1)τl+1 X−  X−  τ (iτ ) n (iτ ) l l ∧ 1/2 l 1/2 min W 2 , (δWk (j 1))2 . ≤ k kk2 2 − j=τl 1+1 k=(i 1)τl+1 k=(i 1)τl+1 X− n X−   X−  o Put (iτ ) n (iτ ) n l ∧ 1/2 l ∧ 1/2 1 2 1 Wk 2 σi,l := Wk 2 , ∆i,j,l := δ2 (j 1) . τl k k τl − k=(i 1)τ +1 k=(i 1)τ +1  X− l   X− l 

33 Then

n τ +1 ⌊ l ⌋ 1/2 1 1 2 n Ti,l(f) 2 τl k k τl i=1  iXeven  n +1 τ τ (iτ ) n ⌊ l ⌋ l l ∧ 1/2 1 1 2 n min Wk 2 , ≤ τ τl k k l i=1 j=τl 1+1 k=(i 1)τl+1  X  X− n X−  (iτ ) n l ∧ 1/2 2 1/2 1 Wk 2 δ2 (j 1) τl − k=(i 1)τ +1  X− l  o  n τ +1 ⌊ l ⌋ 1/2 1 2 2 2 = n τl τl 1 min σi , ∆i,τl 1+1,l τ − − { − }  l i=1   Xn  τ +1 ⌊ l ⌋ 1/2 1 2 2 2 = n τl τl 1 min σi,l, ∆i,τl 1+1,l − − { − }  τl i=1  X  n n τ +1 τ +1 ⌊ l ⌋ ⌊ l ⌋ 1/2 1 2 1 2 τl τl 1 min n σi,l, n ∆i,τl 1+1,l ≤ − − · { τ τ − }  l Xi=1 l Xi=1   n +1 n +1 τ τ τ l ⌊ l ⌋ 1/2 ⌊ l ⌋ 1/2 1 2 1 2 min n σi,l , n ∆i,τl 1+1,l ≤ { − } j=τl 1+1 τl i=1 τl i=1 X−  X   X  τ n l 1/2 1 Wi 2 min f 2,n, δ2 (τl 1) ≤ {k k n − } j=τl 1+1 i=1 X−  X  τ l j min f , D ∆( ) (7.24) ≤ {k k2,n n ⌊2⌋ } j=τl 1+1 X− 1 With Ti,l(f) 2√τl f 2√τlM and (7.23), we obtain √τl ≤ k k∞ ≤

n +1 n +1 L τ L τ 1 ⌊ l ⌋ 1 1 ⌊ l ⌋ 1 2 1/2 2 τ MH E √ √ l max Ti,l(f) c2 sup n Ti,l(f) H + , f n √τl ≤ f √τl n l=1 ∈F i=1 l=1 τl i=1 2 X h τl X i X h  X  τl i i even i even q q and a similar assertion for the second term (i odd) in A2. With (7.24), we conclude that

L 1 1 EA2 E max Ti,l(f) ≤ f n √τl l=1 ∈F 1 i n +1,i odd h τl τ X ≤ ≤⌊ Xl ⌋ q 1 1 +E max Ti,l(f) f n √τl ∈F 1 i n +1,i even τl τ i ≤ ≤⌊ Xl ⌋ L τl q j √τlMH 4c2 min max f 2,n, Dn∆( ) √H + . (7.25) ≤ { f k k ⌊2⌋ } · n l=1 j=τl 1+1 ∈F τ + 1 X h X−  ⌊ l ⌋ i q 34 Note that L L L L 1 √τl √τl 1 1 − l 1 L 2q = τl = 2 (2 + q) . (7.26) n ≤ n √n √n ≤ √n ≤ √n l=1 + 1 l=1 l=0 l=1 X ⌊ τl ⌋ X τl X X Furthermore, weq have by Lemma q7.4 that

L τl j ∞ j min max f 2,n, Dn∆( ) min max f 2,n, Dn∆( ) { f k k ⌊2⌋ } ≤ { f k k ⌊2⌋ } l=1 j=τl 1+1 ∈F j=2 ∈F X X− X 2V¯n(max f 2,n) ≤ f k k ∈F = 2max V¯n( f 2,n) = 2max Vn(f), (7.27) f k k f ∈F ∈F where ∞ V¯ (x)= x + min x, D ∆(j) (7.28) n { n } Xj=1 and the second to last equality holds since x V¯ (x) is increasing. 7→ n

Inserting (7.26) and (7.27) into (7.25), we conclude that with some universal c3 > 0, qMH qMH EA2 c3 sup Vn(f)√H + c2 σ√H + . (7.29) ≤ f √n ≤ √n  ∈F   

W n Since Sn,1 = i=1 Wi,1(f) is a sum of independent variables with Wi,1(f) f M and | | ≤ k k∞ ≤ W (f) 2 f 2V (f) 2σ, we obtain from (7.23) again k i,0 k2 ≤ Pk k2 ≤ n ≤ MH EA c σ√H + . (7.30) 3 ≤ 2 √n   If we insert the bounds (7.22), (7.29) and (7.30) into (7.1), we obtain the result (4.2).

M√H H M√H We now show (4.3). If q∗( D ) n 1, we have q∗( D ) 1, ..., n and thus by (4.2): √n n∞ ≤ √n n∞ ∈ { } √ √ E 1 √ D M H M H MH √ max Sn(f) c H n∞β q∗ D + q∗ D + σ H f √n ≤ √n n∞ √n n∞ √n ∈F        M√H MH √ 2c q∗ D + σ H ≤ √n n∞ √n     M√H H √ √ = 2c nM min q∗ D , 1 + σ H . (7.31) · √n n∞ n  n   o  M√H H If q∗( D ) n 1, we note that the simple bound √n n∞ ≥ 1 E max Sn(f) 2√nM f √n ≤ ∈F M√H H √ √ 2c nM min q∗ D , 1 + σ H (7.32) ≤ √n n∞ n  n   o  holds. Putting the two bounds (7.31) and (7.32) together, we obtain the result (4.3).

35 Lemma 7.4. Let ω(k) be an increasing sequence in k. Then, for any x> 0,

∞ j ∞ min x, D ∆( ) ω(j) 2 min x, D ∆(j) ω(2j + 1). { n ⌊2⌋ } ≤ { n } Xj=2 Xj=1 Especially in the case ω(k) = 1,

∞ j ∞ min x, D ∆( ) 2 min x, D ∆(j) . { n ⌊2⌋ }≤ { n } Xj=2 Xj=1

Proof of Lemma 7.4. It holds that

∞ j min x, D ∆( ) ω(j) { n ⌊2⌋ } Xj=2 ∞ 2k ∞ 2k + 1 = min x, D ∆( ) ω(2k)+ min x, D ∆( ) ω(2k + 1) { n ⌊ 2 ⌋ } { n ⌊ 2 ⌋ } Xk=1 Xk=1 ∞ = min x, D ∆(k) ω(2k)+ ω(2k + 1) { n } · { } Xk=1 ∞ 2 min x, D ∆(k) ω(2k + 1). ≤ { n } · Xk=1

Proof of Corollary 4.3. Let σ := supn N supf Vn(f) < . For Q 1, define ∈ ∈F ∞ ≥ 1/2 √n σQ D Mn = r( D ) n∞. √H n∞ ¯ ¯ ¯ Let F = supf f, and F (z, u) = Dn∞(u) F (z, u). Then F is an envelope function of . We ∈F · F furthermore have n i 1 i ν 1/ν Mn n ν P( sup F (Zi, ) > Mn) P F (Zi, ) > F . (7.33) n ≤ n n n1/ν ≤ M ν · k kν,n i=1,...,n i=1 n  X   Inserting the bound

n n ν 1 i ν i ν ν 1 i ν ν ν F = D∞( ) F¯(Z , ) C D∞( ) C (D∞ ) k kν,n n n n k i n kν ≤ ∆ · n n n ≤ ∆ · ν,n Xi=1 Xi=1 into (7.33) and using r(γa) γr(a) for γ 1,a> 0 (this is similarly proven as for Lemma 7.5), we ≥ ≥ obtain D i H ν/2 C∆ ν,n∞ ν P( sup F (Zi, ) > Mn) n 1 2 σQ1/2 2 D i=1,...,n ≤ n − ν r( D ) · n∞  n∞    1 H ν/2 C∆D∞ ν ν,n . (7.34) ν/2 1 2 σ 2 D ≤ Q n − ν r( D ) · n∞  n∞   

36 Using the rough bound f F and r(a) a for a> 0 from Lemma 7.5, we obtain k kν,n ≤ k kν,n ≤ n n 1 i 1 i ν

max E[f(Z , )½ i ] max E[ f(Z , ) ] i f(Zi, ) >Mn ν 1 i f √n n {| n | } ≤ √nMn− f | n | ∈F Xi=1 ∈F Xi=1 n Mn ν max f ν,n ≤ M ν · √n f k k n ∈F 2 ν/2 1/2 D ν C∆H σQ ν,n∞ ≤ 1 2 σQ1/2 2 · √ · D n − ν r( D ) H n∞  n∞    2 ν/2 D ν σ C∆H ν,n∞ ν 2 2 . (7.35) − 1 σ 2 D ≤ Q 2 √H n − ν r( D ) · n∞  n∞    Abbreviate 2 ν/2 D ν C∆H ν,n∞ Cn := . 1 2 σ 2 D n − ν r( D ) · n∞  n∞   

By assumption, supn N Cn < . By Theorem 4.1, (7.34) and (7.35), ∈ ∞

P max Gn(f) >Q√H f ∈F   i P max Gn(f) >Q√H, sup F¯(Zi, ) M ≤ f n ≤ ∈F i=1,...,n   i +P( sup F (Zi, ) > M) i=1,...,n n

P max Gn(max min f, M , M ) >Q√H/2 ≤ f { { } − } ∈F  n  1 i

P E ½ √ + max [f(Zi, ) f(Z , i ) >M ] >Q H/2 f √n n i n ∈F i=1 {| | }  X  i +P( sup F (Zi, ) > M) i=1,...,n n 1/2 1/2 2c √ σQ σQ D 1 2σ σ H + q∗ r( D ) r( D ) n∞ + ν + ν Cn ≤ Q√H n∞ n∞ Q 2 Q 2 H 4cσ h 1 2σ  i   1/2 + ν + ν Cn. ≤ Q Q 2 Q 2 H   Since supn N Cn < and σ is independent of n, the assertion follows for Q . ∈ ∞ →∞

7.5 Proofs of Section 4.2

Proof of Theorem 4.4. In the following, we abbreviate H(δ)= H(δ, ,Vn) and N(δ)= N(δ, ,Vn). j F F Choose δ0 = σ and δj = 2− δ0.

For each j N , we choose a covering by brackets pre := [l , u ] , k = 1, ..., N(δ ) such ∈ 0 Fjk jk jk ∩ F j that Vn(ujk ljk) δj and supf,g f g ujk ljk =: ∆jk. We may assume w.l.o.g. that − ≤ ∈Fjk | − | ≤ − l , u , ∆ . jk jk jk ∈ F

37 If l , u do not belong to , we can simply define new brackets by jk jk F

˜ljk(z, u) := inf f(z, u), u˜jk(z, u):= sup f(z, u) f [ljk,ujk] f [l ,u ] ∈ ∈ jk jk which fulfill [l , u ] = [˜l , u˜ ] , and jk jk ∩ F jk jk ∩ F

˜ljk(z, u) ˜ljk(z′, u) sup f(z, u) f(z′, u) . | − |≤ f [l ,u ] | − | ∈ jk jk Thus, we can add ˜l , u˜ to without changing the bracketing numbers N(ε, , ) and the jk jk F F k · k validity of Assumption 4.2.

We now construct inductively a new nested sequence of partitions ( ) of from ( pre) in the Fjk k F Fjk k following way: For each fixed j N , put ∈ 0 j pre jk : k := : ki 1, ..., N(δi) , i 0, ..., j {F } { Fiki ∈ { } ∈ { }} i\=0 as the intersections of all previous partitions and the j-th partition. Then : k N := |{Fjk }| ≤ j N(δ ) ... N(δ ). By monotonicity of V , we have 0 · · j n

sup f g ∆jk,Vn(∆jk) δj. f,g | − |≤ ≤ ∈Fjk

In each , fix some f , and define π f := f where ψ f := min i 1, ..., N : f . Fjk jk ∈ F j j,ψjf j { ∈ { j } ∈ Fji} Put ∆jf := ∆j,ψj f and σ I(σ) := 1 H(ε, ,V )dε, ∨ F n Z0 we set p I(σ) τ := min j 0 : δ 1. (7.36) ≥ j ≤ √n ∨ n o Put 1 m := m(n, δ ,N ), j 2 j j+1 (m( ) from Lemma 7.2). Choose M = 1 m . We then have · n 2 0 n 1

E G E G E ½ sup n(f) sup n(f) + Wi(F F>Mn ) , ≤ √n { } f f (Mn) i=1 ∈F ∈F X   where (Mn) := ϕ∧ (f) : f . Due to Lemma 7.1(iii), (Mn) still fulfills Assumption 4.2. F { Mn ∈ F} F Since f g implies W (f) W (g) and W (g) g(Z , i ) , it holds that | |≤ | i |≤ i k i k1 ≤ k i n k1 n 1 G (f) W (f) EW (f) | n | ≤ √n i − i i=1 X n 2 G (g)+ W (g) G (g) + 2√n g . ≤ n √n k i k1 ≤ n k k1,n Xi=1 38 By (7.8) and (7.9) and the fact that f π0f 2Mn m0, we have the decomposition k − k∞ ≤ ≤

sup Gn(f) sup Gn(π0f) f | | ≤ f | | ∈F ∈F τ 1 G − G + sup n(ϕm∧ τ (f πτ f)) + sup n(ϕm∧ j mj+1 (πj+1f πjf)) f | − | f − − ∈F Xj=0 ∈F τ 1 − + sup Gn(R(j)) f | | Xj=0 ∈F sup Gn(π0f) ≤ f | | ∈F G √ + sup n(ϕm∧ τ (∆τ f)) + 2 n sup ∆τ f 1,n f | | f k k n ∈F ∈F o τ 1 − G + sup n(ϕm∧ j mj+1 (πj+1f πjf)) f − − j=0 ∈F X τ 1 − G + sup n(min ϕm∨ j+1 (∆j+1f) , 2mj ) j=0 f X n ∈F 

½ +2√n sup ∆j+1f ∆j+1f>mj+1 1,n f k { }k ∈F o τ 1 − G + sup n(min ϕm∨ m (∆jf) , 2mj ) j − j+1 j=0 f X n ∈F 

½ +2√n sup ∆jf ∆j f>mj mj+1 1,n f k { − }k ∈F o =: R1 + R2 + R3 + R4 + R5. (7.37)

D n∞ Dn We now discuss the terms Ri, i 1, ..., 5 from (7.37). Therefore, put Cn := c(1 + D )+ D . ∈ { } n n∞ Since ∆ = u l with l , u , the class 1 ∆ : k 1, ..., N(δ ) still fulfills Assumption jk jk − jk jk jk ∈ F { 2 jk ∈ { j }} 4.2. We conclude by Lemma 7.1(iii) that for arbitrary m, m>˜ 0, the classes 1 ϕ∧ (∆ ) : k 1, ..., N(δ ) , {2 m jk ∈ { j }} 1 min ϕ∨ (∆ ), 2m ˜ : k 1, ..., N(δ ) , {2 { m jk } ∈ { j }} 1 ϕ∧ (π f π f) : k 1, ..., N(δ ) {2 m j+1 − j ∈ { j }} fulfill Assumption 4.2.

• Since π0f : f (Mn) N(δ0)= N(σ), π0f Mn m(n, δ0, N(δ1)) and Vn(π0f) |{ ∈ F }| ≤ k k∞ ≤ ≤ ≤ σ = δ (by assumption, every f fulfills V (f) σ), we have by (7.10): 0 ∈ F n ≤

ER1 = E sup Gn(π0f) Cnδ0 1 log N(δ1). f (Mn) | |≤ ∨ ∈F p

39 • It holds that ϕm∧ (∆τ f) : f (Mn) Nτ . If g := ϕm∧ (∆τ f), then g mτ |{ τ ∈ F }| ≤ τ k k∞ ≤ ≤ m(n, δ ,N ) and V (g) V (∆ f) δ . We conclude by (7.10) that: τ τ+1 n ≤ n τ ≤ τ E G sup n(ϕm∧ τ (∆τ f)) Cnδτ 1 log Nτ+1. (7.38) f (Mn) | |≤ · ∨ ∈F p For the second term, we have by definition of τ in (7.36) and the Cauchy Schwarz inequality:

√n ∆ f √n ∆ f √nV (∆ f) √nδ I(σ). (7.39) k τ k1,n ≤ k τ k2,n ≤ n τ ≤ τ ≤ From (7.38) and (7.39) we obtain

ER C δ 1 log N + 2 I(σ). 2 ≤ n τ ∨ τ+1 · • p Since the partitions are nested, it holds that ϕm∧ m (πj+1f πjf) : f (Mn) Nj+1. |{ j − j+1 − ∈ F }| ≤ If g := ϕm∧ m (πj+1f πjf), we have g mj mj+1 mj m(n, δj,Nj+1) and j − j+1 − k k∞ ≤ − ≤ ≤ g π f π f ∆ f. | | ≤ | j+1 − j |≤ j Furthermore, V (g) V (∆ f) δ . We conclude by (7.10) that: n ≤ n j ≤ j τ 1 τ 1 E − E G − R3 sup n(ϕm∧ m (πj+1f πjf)) Cn δj 1 log Nj+1. ≤ | j − j+1 − |≤ ∨ j=0 f (Mn) j=0 X ∈F X p • It holds that min ϕ (∆ f), 2m : f (M ) N . If g := min ϕ (∆ f), 2m , |{ { m∨ j+1 j+1 j} ∈ F n }| ≤ j+1 { m∨ j+1 j+1 j} we have g 2mj = m(n, δj,Nj+1) and k k∞ ≤ g ∆ f. | |≤ j+1 By monotonicity of V , we have V (g) V (∆ f) δ δ . We conclude by (7.10) that: n n ≤ n j+1 ≤ j+1 ≤ j τ 1 τ 1 − − E sup G (min ϕ∨ (∆ f), 2m ) C δ 1 log N . (7.40) | n { mj+1 j+1 j} |≤ n j ∨ j+1 j=0 f (Mn) j=0 X ∈F X p Note that V (∆ f) δ and m = 1 m(n, δ ,N ). By (7.11), we have n j+1 ≤ j+1 j+1 2 j+1 j+2

√n ∆j+1f ½ ∆ f>m 1 2δj+1 1 log Nj+2. (7.41) k { j+1 j+1}k ≤ ∨ From (7.40) and (7.41) we obtain p

τ ER (C + 4) δ 1 log N . 4 ≤ n j ∨ j+1 j=0 X p • It holds that min ϕm∨ m (∆jf), 2mj : f (Mn) Nj+1. If g := min ϕm∨ m (∆jf), 2mj , |{ { j − j+1 } ∈ F }| ≤ { j − j+1 } we have g 2mj = m(n, δj,Nj+1) and k k∞ ≤ g ∆ f. | |≤ j Thus, V (g) V (∆ f) δ . We conclude by (7.10) that: n ≤ n j ≤ j τ 1 τ 1 − E G − sup n(min ϕm∨ m (∆j+1f), 2mj ) Cn δj 1 log Nj+1. (7.42) | { j − j+1 } |≤ · ∨ j=0 f (Mn) j=0 X ∈F X p 40 Note that V (∆ f) δ and n j ≤ j 2(m m ) = m(n, δ ,N ) m(n, δ ,N ) j − j+1 j j+1 − j+1 j+2 δj δj+1 r( D ) r( D ) D 1/2 n n = n∞n 1 log Nj+1 − 1 log Nj+2 h ∨ ∨ i D n1/2 δ δ n∞ p r( j ) r(pj+1 ) ≥ 1 log N Dn − Dn ∨ j+1 D 1/2   1p n∞n δj r( )= mj, ≥ 2 1 log N Dn ∨ j+1 where the last inequality is due to Lemmap 7.5. By (7.11) we have

1

mj = 2 m(n,δj ,Nj+1) ½ √ ½ √ m n ∆jf ∆jf>mj mj+1 1,n n ∆jf ∆ f> j 1,n 4δj 1 log Nj+1. k { − }k ≤ k { j 2 }k ≤ ∨ p (7.43) From (7.42) and (7.43) we obtain

τ 1 − R (C + 8) δ 1 log N . 5 ≤ n j ∨ j+1 j=0 X p

Summarizing the bounds for Ri, i = 1, ..., 5, we obtain that with some universal constantc> ˜ 0,

τ E sup Gn(f) c˜ Cn δj 1 log Nj+1 + I(σ) . (7.44) f (Mn) ≤ · ∨ ∈F h j=0 i X p 1/2 We have (1 log N )1/2 = 1 j log N(δ ) j (1 H(δ )) j (1 H(δ ))1/2, thus ∨ j ∨ i=0 i ≤ i=0 ∨ i ≤ i=0 ∨ i     τ P j P P ∞ ∞ ∞ δ 1 log N δ 1 H(δ ) δ 1 H(δ ) j ∨ j+1 ≤ j ∨ i+1 ≤ j ∨ i+1 j=0 j=0 i=0 i=0 j=i X p X X p X  X p ∞ ∞ = 2 δ 1 H(δ ) 4 δ 1 H(δ ). (7.45) i ∨ i+1 ≤ i+1 ∨ i+1 i=0 i=0 X p X p

Since H is increasing, we obtain

∞ ∞ ∞ δ 1 H(δ ) δ 1 H(δ ) = 2 δ 1 H(δ ) i+1 ∨ i+1 ≤ i ∨ i i+1 ∨ i i=0 i=0 i=0 X p X p X p ∞ δi = 2 1 H(δ )dε ∨ i i=0 δi+1 X Z p ∞ δi σ 2 1 H(ε)dε = 2 1 H(ε)dε = 2 I(σ). (7.46) ≤ ∨ ∨ · i=0 δi+1 0 X Z p Z p Inserting (7.46) into (7.45) and then into (7.44), we obtain the result.

41 Proof of Corollary 4.5. Define ˜ := f g : f, g . It is easily seen that N(ε, ˜,V ) F { − ∈ F} F n ≤ N( ε , ,V )2 (cf. [43], Theorem 19.5), thus 2 F n ε H(ε, ˜,V ) 2H( , ,V ) (7.47) F n ≤ 2 F n

Let σ> 0. Define ¯ ¯ ¯ F (z, u) := 2Dn∞(u) F (z, u), F (z, u) := sup f(z, u) . · f | | ∈F Then obviously, F is an envelope function of ˜. F By Markov’s inequality, Theorem 4.4 and (7.47),

P sup Gn(f) Gn(g) η Vn(f g) σ, f,g | − |≥  − ≤ ∈F  1 E sup Gn(f) Gn(g) ≤ η Vn(f g) σ, f,g | − | − ≤ ∈F 1 = E sup G (f˜) η | n | f˜ ˜,Vn(f˜) σ ∈F ≤ D D σ c˜ n∞ n ˜ H √ ½ 1 σ (1 + + ) 1 (ε, ,Vn)dε + n F F> m(n,σ,N( )) 1 ≤ η Dn D ∨ F { 4 2 } n∞ Z0 q h D D σ/2 H σ i c˜ n∞ n 4 1 ( 2 ) 2 2√2(1 + + ) 1 H(u, ,V )du + ∨ F ½ r(σ) . D D n σ F> 1 n1/2 1,n ≤ η n n∞ 0 ∨ F r( D ) { 4 1 H( σ ) } Z p n √ ∨ 2 h p i

The first term converges to 0 by (4.7) and (4.8) for σ 0 (uniformly in n). → We now discuss the second term. The continuity conditions from Assumption 4.2 and Assumption 3.2 transfer to F¯ by the inequality

F¯(z1, u1) F¯(z2, u2) = sup f¯(z1, u1) sup f¯(z2, u2) sup f(z1, u1) f(z2, u2) | − | | f − f |≤ f | − | ∈F ∈F ∈F We therefore have as in Lemma 7.8(ii) that for all u, u , u , v , v [0, 1], 1 2 1 2 ∈ i αs F¯(Z , u) F¯(Z˜ ( ), u) C n− , (7.48) k i − i n k2 ≤ cont · F¯(Z (v ), u ) F¯(Z˜ (v ), v ) C v v αs + u u αs . (7.49) k i 1 1 − i 2 2 k2 ≤ cont · | 1 − 2| | 1 − 2| 1/2 Put c = 1 n r(σ) . Then by Lemma 7.6 (ii) and (7.48),  n 8 sup D ( i ) 1 H( σ ) i=1,...,n n∞ n √ ∨ 2 2 F ½ r(σ) F> 1 n1/2 1,n k { 4 1 H( σ ) }k √ ∨ 2 n 4 i 2 ¯ i 2 D ( ) E F (Z , ) ½ i n∞ i F¯(Z , ) >cn ≤ n n · n {| i n | } i=1 h i Xn 16 i 2 ¯ ˜ i i 2 D ( ) E F (Z ( ), ) ½ i i n∞ i F¯(Z˜ ( ), ) >cn ≤ n n · n n {| i n n | } i=1 h i X αs 2 +16C n− (D∞) . (7.50) cont · · n

42 ˜ ¯ ˜ 2 ˜ ˜ Put Wi(u) := F (Zi(u), u) and an(u) := (Dn∞(u)) . By (7.49), Wi(u1) Wi(u2) 2 2Ccont u1 αs 1k n − i k ≤ D| −2 u2 . By the assumptions on Df,n( ), cn and lim supn n i=1 an( n ) = limsupn ( n∞) < | · →∞ →∞ | | →∞ . We conclude with Lemma 7.7(i) that ∞ P n 16 i 2 ¯ ˜ i i 2 D∞( ) E F (Z ( ), ) ½ i i 0, n i F¯(Z˜ ( ), ) >cn n n · n n {| i n n | } → Xi=1 h i D that is, the first summand in (7.50) tends to 0. Since lim supn n∞ < , we obtain that (7.50) →∞ ∞ tends to 0.

7.6 Proofs of Section 7.2

Proof of Lemma 7.1. (i) Since x + x m implies x , x m, we have | 1| | 2|≤ | 1| | 2|≤

I := ϕ∧ (x + x + x ) ϕ∧ (x ) ϕ∧ (x ) = ϕ∧ (x + x + x ) x x . m 1 2 3 − m 1 − m 2 m 1 2 3 − 1 − 2| Case 1: x + x + x > m. Then, since x + x m, we have I = m x x = 1 2 3 | 1| | 2| ≤ | − 1 − 2| m x x m x m, x>m − | |−

ϕm∨ (x) = 0, x [ m,m] = 0, x [ m,m] = ( x m)½ x >m | |  ∈ −  ∈ − | |− | |  x m, x< m  x m, x< m

− − − | |− −

½ ½ (y m)½y>m = (y m) 0 = (y m) y m>0 y y>m, ≤  − − ∨  − { − } ≤ which shows the second assertion.

(iii) We will show that for all z, z RN it holds that ′ ∈

ϕ∧ (f)(z) ϕ∧ (f)(z′) f(z) f(z′) , ϕ∨ (f)(z) ϕ∨ (f)(z′) f(z) f(z′) (7.51) | m − m | ≤ | − | | m − m | ≤ | − |

from which the assertion follows. For real numbers ai, bi, we have

max ai = max ai bi + bi max ai bi + max bi , i { } i { − }≤ i { − } i { } thus max a max b max a b . This implies max a, y max a, y y y | i{ i}− i{ i}| ≤ i | i − i| | { }− { ′}| ≤ | − ′| and therefore

ϕ∧ (f)(z) ϕ∧ (f)(z′) = ( m) (f(z) m) ( m) (f(z′) m) f(z) m f(z′) m | m − m | | − ∨ ∧ − − ∨ ∧ | ≤ | ∧ − ∧ | = ( f(z′)) ( m) ( f(z)) ( m) f(z) f(z′) . | − ∨ − − − ∨ − | ≤ | − | For the second inequality in (7.51), note that

ϕ∨ (f)(z) = (f(z) m) 0 + (f(z)+ m) 0. m − ∨ ∧

43 We therefore have

ϕ∨ (f)(z) ϕ∨ (f)(z′) = (f(z) m) 0 (f(z′) m) 0 + (f(z)+ m) 0 (f(z′)+ m) 0 . | m − m | − ∨ − − ∨ ∧ − ∧ | If f(z),f(z ) m, then ′ ≥

ϕ∨ (f)(z) ϕ∨ (f)(z′) (f(z) m) 0 (f(z′) m) 0 f(z) f(z′) . | m − m |≤ − ∨ − − ∨ | ≤ | − | A similar result is obtained for f(z ),f(z ) m. If f(z) m, f(z )

ϕ∨ (f)(z) ϕ∨ (f)(z′) | m − m | (f(z) m) (f(z′)+ m) 0 ≤ − − ∧ | f(z) f(z′) 2m = f(z) f(z′) 2m f(z) f(z′), f(z′) m, = | − − | − − ≤ − ≤− . ( f(z) m = f(z) m f(z) f(z′), f(z′) > m | − | − ≤ − − A similar result is obtained for f(z) m, f(z ) m, which proves (7.51). ≥ ′ ≤

Lemma 7.5 (Properties of r( )). r( ) is well-defined and for each a> 0, r(a) r( a ) and r(a) a. · · 2 ≥ 2 ≤

Proof. q ( ) and r( ) are well-defined since β ( ) is decreasing (at a rate q 1) and r q (r)r ∗ · · norm · ≪ − 7→ ∗ is increasing (at a rate r) and limr 0 q∗(r)r = 0. ≪ ↓ a Let a> 0. We show that r = 2r( 2 ) fulfills q∗(r)r a. By definition of r(a), we obtain r(a) r = a ≤ ≥ 2r( 2 ) which gives the result. Since βnorm is decreasing, q∗ is decreasing. We conclude that a a a a a q∗(r)r = 2 q∗(2r( ))r( ) 2 q∗(r( ))r( ) 2 = a. · 2 2 ≤ · 2 2 ≤ · 2

The second inequality r(a) a follows from the fact that q (r)r is increasing and q (a)a a. ≤ ∗ ∗ ≥

7.7 Proofs of Section 3

Proof of Theorem 3.4. Denote W (f) := f(Z , i ) and W := (W (f ), ..., W (f )) . Let a = (a , ..., a ) i i n i i 1 i m ′ 1 m ′ ∈ Rm 0 . We use the decomposition \{ } n n 1 ∞ 1 a′(Wi EWi)= a′Pi jWi. √n − √n − Xi=1 Xj=0 Xi=1 For fixed J N , put ∈ ∪ {∞} J 1 n − 1 (Sn(J))k=1,...,m := Sn(J) := Pi jWi. √n − Xj=0 Xi=1

44 Then, since Pi jWi(fk), i = 1, ..., n is a martingale difference sequence and by Lemma 7.8(i), − n n 1/2 ∞ 1 ∞ 1 2 Sn( )k Sn(J)k 2 Pi jWi(fk) = Pi jWi(fk) 2 k ∞ − k ≤ √n − 2 n k − k Xj=J Xi=1 jX=J  Xi=1  n 1 i 1/2 ∞ D ( )2 ∆(j), ≤ n fk,2,n n ·  Xi=1  Xj=J thus n 1/2 1 i 2 ∞ lim sup Sn( )k Sn(J)k 2 sup Dfk,2,n( ) lim sup ∆(j) = 0. (7.52) J,n k ∞ − k ≤ n N n n · J →∞ ∈  Xi=1  →∞ Xj=J

Define n J+1 J 1 1 − − (S◦(J) ) := S◦(J) := P W . n k k=1,...,m n √n i i+j Xi=1 Xj=0 Then we have

J 1 j J 1 n − 1 1 − Sn◦(J)k Sn(J)k 2 Pi jWi(fk) 2 + Pi jWi(fk) 2 k − k ≤ k√n − k √n k − k j=0 i=1 j=0 i=n J+j+1 X X X −X 2J 2 sup Pi j Wi(fk) 2 ≤ √n · i=1,...,n+j k − k 2J 2 i sup fk(Zi, ) 2. ≤ √n · i=1,...,n+j k n k

By Lemma 7.8(i), i i sup fk(Zi, ) 2 C∆,2 D2,n( ), i=1,...,n+j k n k ≤ · n which gives lim S◦(J)k Sn(J)k 2 = 0. (7.53) n n →∞ k − k

˜ ˜ Stationary approximation: Put Sn◦(J) = (Sn◦(J)k)k=1,...,m, where

n J+1 J 1 1 − − i i S˜◦(J) := P f (Z˜ ( ), ). n k √n i k i+j n n Xi=1 Xj=0 Then we have

S◦(J) S˜◦(J) k n k − n kk2 J 1 n J+1 − 1 − i + j i i 2 1/2 Pifk(Zi+j, ) Pifk(Z˜i+j( ), ) . ≤ n n − n n 2 j=0  i=1  X X

45 For each j, k, it holds that

n J+1 1 − i + j i i P f (Z , ) P f (Z˜ ( ), ) 2 n k i k i+j n − i k i+j n n k2 Xi=1 n J+1 2 2 − i + j i ¯ i + j 2 Dfk,n( ) Dfk,n( ) sup f(Zi+j, ) 2 ≤ n n − n · i k n k Xi=1   n J+1 2 − i 2 ¯ i + j ¯ ˜ i i 2 + Df,n( ) sup fk(Zi+j, ) fk(Zi+j( ), )] 2. n n · i n − n n k i=1 X

By Lemma 7.8, we have sup f¯(Z , i+j ) 2 < . Since 1 D ( ) has bounded variation uni- i k i+j n k2 ∞ √n fk,n · formly in n,

n J+1 n J+1 1 − i + j i 2 1 i 1 − i + j i Dfk,n( ) Dfk,n( ) sup Dfk,n( ) Dfk,n( ) Dfk,n( ) 0. n n − n ≤ i=1,...,n √n n ·√n n − n → i=1   i=1 X X By Lemma 7.8(ii),

i + j i i sup f¯k(Zi+j, ) f¯k(Z˜i+j( ), ) 0. i n − n n 2 →

We therefore obtain S◦(J) S˜◦(J) 0. (7.54) k n k − n kk2 →

Note that J 1 i i M := P f (Z˜ ( ), ), i = 1, ..., n i,k √n i k i+j n n Xj=0 is a martingale difference sequence with respect to i 1, and G − n J+1 ˜ − Sn◦(J)k = Mi,k. Xi=1 ˜ We can therefore apply a central limit theorem for martingale difference sequences to a′Sn◦(J) = n J+1 m i=1− ( k=1 akMi,k).

TheP LindebergP condition: Let ς > 0. Iterated application of Lemma 7.6(i) yields that there are constants c1, c2 > 0 only depending on m, J such that

n J+1 m − 2

E ½ [( akMi,k) Pm a M >ς√n ] {| k=1 k i,k| } Xi=1 Xk=1 J 1 m n J − 2 1 − ˜ i i 2 c a E E[f (Z ( ), ) ] ½ i i ς . 1 k k i+j i l E[fk(Z˜i+j ( ), ) i l] >√n n n n − n n c2 a ≤ | | · |G {| |G − | | |∞ } lX=0,1 Xj=0 Xk=1 Xi=1 h i

46 For each l, j, k, we have

n J 1 − ˜ i i 2 E E[f (Z ( ), ) ] ½ i i ς k i+j i l E[fk(Z˜i+j ( ), ) i l] >√n n n n − n n c2 a |G {| |G − | | |∞ } Xi=1 h i n J 1 − i 2 ¯ ˜ i i 2 = D ( ) E E[f (Z ( ), ) ] ½ √n fk,n k i i l E[f¯ (Z˜ ( i ), i ) ] > ς n n n n |G − k i n n i l i c a {| |G − | supi=1,...,n Df,n( n ) 2| | } Xi=1 h | | ∞ i n J 1 − i 2 ˜ i 2 = D ( ) E W ( ) ½ i , (7.55) fk,n i W˜ ( ) >cn n n n {| i n | } Xi=1 h i where we have put

˜ E ¯ ˜ √n ς Wi(u) := [fk(Zi(u), u) i l], cn := i . |G − sup Df,n( ) c2 a i=1,...,n | n | | |∞

By Lemma 7.8(ii), W˜ i(u) satisfies the assumptions (7.59) of Lemma 7.7. By assumption, cn . 2 →∞ With an(u) := Dfk,n(u) , we obtain from Lemma 7.7 that (7.55) converges to 0, which shows that the Lindeberg condition is satisfied.

Convergence of the variance: We have

n J+1 m − 2 E[( Mi,k) i 1] |G − Xi=1 Xk=1 J 1 m n J+1 − 1 − i i E ¯ ˜ i i ¯ ˜ i i = akal Df ,n( )Df ,n( ) Pifk(Zi+j1 ( ), ) Pifl(Zi+j2 ( ), ) i 1 . · n k n l n · n n · n n |G − j ,j =0 k ,k =1 i=1 1X2 1X2 X   For each j1, j2, k1, k2, we define ˜ E ¯ ˜ ¯ ˜ Wi(u) := Pifk(Zi+j1 (u), u) Pifl(Zi+j2 (u), u) i 1 , an(u) := Df ,n(u)Df ,n(u). · |G − k l Then  

n J+1 1 − i i E ¯ ˜ i i ¯ ˜ i i Df ,n( )Df ,n( ) Pifk(Zi+j1 ( ), ) Pifl(Zi+j2 ( ), ) i 1 n k n l n · n n · n n |G − Xi=1 n J+1   1 − i i = a ( )W˜ ( ). n n n i n Xi=1 By Lemma 7.8(i),(ii), we have

W˜ (u) W˜ (v) f¯ (Z˜ (u), u) f¯ (Z˜ (v), v) f¯(Z˜ (u)) k 0 − 0 k1 ≤ k k 0 − k 0 k2 · k l 0 k2 + f¯l(Z˜0(u), u) f¯l(Z˜0(v), v) 2 f¯k(Z˜0(v)) 2 k ςs/−2 k · k k 2C C ¯ u v ≤ cont f · | − |

i Df,n( ) · Let An := supi=1,...,n an( n ) . Since D has bounded variation uniformly in n, it follows that | | f,n∞ an( ) Df,n∞ A · has bounded variation uniformly in n. From 0 we conclude n 0. An √n → n →

47 By assumption and the Cauchy-Schwarz inequality,

n n n 1/2 1/2 1 i 1 i 2 1 i 2 sup an( ) sup Dfk,n( ) Dfl,n( ) < . n n | n | ≤ n n n · n n ∞ h Xi=1 i  Xi=1   Xi=1  1/2 1/2 It holds that sup (hn An) sup (hn D∞ ) sup (hn D∞ ) < , and n · ≤ n fk,n · n fl,n ∞ v u > h D (u) = 0, D (u) = 0, a (u) = 0. | − | n ⇒ fk,n fl,n ⇒ n Thus, Lemma 7.7(ii) is applicable.

Case K =1: If u E[P f¯ (Z˜ (u), u) P f¯(Z˜ (u), u)] has bounded variation, we have 7→ 0 k j1 · 0 l j2 n J+1 1 − i i E ¯ ˜ i i ¯ ˜ i i Df ,n( )Df ,n( ) Pifk(Zi+j1 ( ), ) Pifl(Zi+j2 ( ), ) i 1 n k n l n · n n · n n |G − i=1 X 1   p lim Df ,n(u)Df ,n(u) E[P0f¯k(Z˜j (u), u) P0f¯l(Z˜j (u), u)]du. → n k l · 1 · 2 →∞ Z0 and thus

n J+1 m − 2 E[( Mi,k) i 1] |G − Xi=1 Xk=1 m 1 J 1 p − akal lim Df ,n(u)Df ,n(u) E[P0f¯k(Z˜j (u), u) P0f¯l(Z˜j (u), u)]du n k l 1 2 → · →∞ 0 · · k,lX=1 Z j1X,j2=0 (1) = a′Σkl (J)a

Here, for f, g , we have that E[P f¯(Z˜ (u), u) P g¯(Z˜ (u), u)] can be written as ∈ F 0 j1 · 0 j2 E[P f¯(Z˜ (u), u) P g¯(Z˜ (u), u)] 0 j1 · 0 j2 E E ¯ ˜ E ˜ E E ¯ ˜ E ˜ = [ [f(Zj1 (u), u) 0]] [¯g(Zj2 (u), u) 0]] [ [f(Zj1 (u), u) 1]] [¯g(Zj2 (u), u) 1]] |G · |G − |G− · |G− which shows that the condition stated in the assumption guarantees the bounded variation of u E[P f¯(Z˜ (u), u) P g¯(Z˜ (u), u)]. 7→ 0 j1 · 0 j2 Case K =2: If h 0, then we obtain similarly n → n J+1 m − 2 E[( Mi,k) i 1] |G − Xi=1 Xk=1 m 1 J 1 p − akal lim Df ,n(u)Df ,n(u)du E[P0f¯k(Z˜j (v), v) P0f¯l(Z˜j (v), v)]du n k l 1 2 → · →∞ 0 · · k,lX=1 Z j1X,j2=0 (2) = a′Σkl (J)a. By the martingale central limit theorem and (7.53), (7.54), we obtain that

d (K) a′S (J) N(0, a′Σ (J)a). (7.56) n → kl

48 Conclusion: For K 1, 2 , we have ∈ { } (K) (K) a′Σ (J)a a′Σ ( )a (J ) (7.57) kl → kl ∞ →∞ due to

P f¯ (Z˜ (u), u) P f¯(Z˜ (u), u) k 0 k j1 · 0 l j2 k1 j1,j2:max j1,j2 J X{ }≥ P f¯ (Z˜ (u), u) P f¯(Z˜ (u), u) 0 (J ) ≤ k 0 k j1 k2k 0 l j2 k2 → →∞ j1,j2:max j1,j2 J X{ }≥ uniformly in n and

1 1 1 sup D (u)D (u) du sup D (u)2du 1/2 D (u)2du 1/2 < . | fk,n fl,n | ≤ fk,n fl,n ∞ n Z0 n Z0 Z0   By (7.52), (7.56), (7.57),

∞ Cov(f¯ (Z˜ (u), u), f¯(Z˜ (u), u)) = E[P f¯ (Z˜ (u), u) P f¯(Z˜ (u), u)] k 0 l j 0 k j1 · 0 l j2 j Z j1,j2=0 X∈ X and the Cramer-Wold device, the assertion of the theorem follows.

Lemma 7.6. Let c R, c> 0. ∈

(i) For x,y R, it holds that ∈

2 2 2

½ ½ (x + y) ½ x+y >c 8x x > c + 8y y > c . {| | } ≤ {| | 2 } {| | 2 }

(ii) For random variables W, W˜ , it holds that

2 ˜ 2 ˜ 2 ½

E ½ E E [W W >c ] 4 [(W W ) ] + 4 [W W˜ > c ]. {| | } ≤ − {| | 2 }

Proof of Lemma 7.6. (i) It holds that

2 2 2 ½ (x + y) ½ x+y >c 2 x + y x > c or y > c {| | } ≤ {| | 2 | | 2 }

2 2

½ ½ 2x + y  2½ x > c , y > c + x > c , y c + x c , y > c ≤ {| | 2 | | 2 } {| | 2 | |≤ 2 } {| |≤ 2 | | 2 }

2 2 2 2

½ ½ ½ 4x ½ x >c + y y > c + 4x x > c + 4y y > c ≤ {| | 2 } {| | 2 } {| | 2 } {| | 2 }

2 2 ½ 8x ½ x > c + 8y y > c . ≤ {| | 2 } {| | 2 } (ii) We have

2 ˜ 2 ˜ 2 ½ ½

E ½ E E [W W >c ] 2 [( W W ) W >c ] + 2 [W W >c ] {| | } ≤ | |− {| | } {| | } ˜ 2 ˜ 2

E E ½ 2 [(W W ) ] + 2 [W W W˜ + W˜ >c ]. (7.58) ≤ − {| − | | | }

49 Furthermore, with Markov’s inequality, ˜ 2

E ½ [W W W˜ + W˜ >c ] {| − | | | }

2 2 ½

E ˜ ½ E ˜ [W W W˜ > c ]+ [W W˜ > c ] ≤ {| − | 2 } {| | 2 }

c 2 ˜ c ˜ 2 ˜ 2 ½ ½

P E ½ E ( ) ( W W > )+ [W W W˜ > c W˜ > c ]+ [W W˜ > c ] ≤ 2 | − | 2 {| − | 2 } {| | 2 } {| | 2 } ˜ 2 ˜ 2

E E ½ [(W W ) ] + 2 [W W˜ > c ]. ≤ − {| | 2 } Inserting this inequality into (7.58), we obtain the assertion.

The following lemma generalizes some results from [8] using similar techniques as therein. Lemma 7.7. Let q 1, 2 . Let W˜ (u) be a stationary sequence with ∈ { } i ς sup W˜ 0(u) q < , W˜ 0(u) W˜ 0(v) q CW u v . (7.59) u [0,1] k k ∞ k − k ≤ | − | ∈ R 1 n i Let an : [0, 1] be some sequence of functions with lim supn n i=1 an( n ) < . → →∞ | | ∞ P (i) Let q = 2. Let c be some sequence with c . Then n n →∞ n 1 i ˜ i 2 a ( ) E[W ( ) ½ i ] 0, n i W˜ ( ) >cn n | n | · n {| i n | } → Xi=1 (ii) Let q = 1. Suppose that there exists h > 0, v [0, 1] such that for all u [0, 1], v u > h n ∈ ∈ | − | n implies a (u) = 0. Put A = sup a ( i ) and suppose that n n i=1,...,n | n n | An an( ) sup(hn An) < , 0, · has bounded variation uniformly in n. n N · ∞ n → An ∈ Suppose that the limits on the following right hand sides exist. If u EW˜ (u) has bounded 7→ 0 variation, then n 1 1 i i p an( )W˜ i( ) lim an(u)EW˜ 0(u)du. n n n n → →∞ 0 Xi=1 Z If hn 0, then → n 1 1 i i p an( )W˜ i( ) lim an(u)du EW˜ 0(v). n n n n → →∞ 0 · Xi=1 Z

Proof of Lemma 7.7. Let J N be fixed and assume that n 2 2J . For j 1, ..., 2J , Define i ∈ j 1 j ≥ · ∈ { } I := i 1, ..., n : ( − , ] . Then (I ) forms a decomposition of 1, ..., n in the j,J,n { ∈ { } n ∈ 2J 2J } j,J,n j { } 2J i j 1 j j 1 j 1 n sense that I = 1, ..., n . Since ( − , ] − n < i n − , we conclude j=1 j,J,n { } n ∈ 2J 2J ⇐⇒ 2J · ≤ · 2J ≤ 2J that n 1 I n . Thus, since n 2 2J , 2J − P≤ | j,J,n|≤ 2J ≥ · I 1 1 1 n j,J,n| , I . (7.60) n − 2J ≤ n | j,J,n|≥ 2 2J

50 Let w , i N be an arbitrary sequence. Then it holds that i ∈ n 2J 2J 1 1 1 Ij,J,n 1 1 wi J wi | | J wi n − 2 Ij,J,n ≤ n − 2 · Ij,J,n i=1 j=1 | | i Ij,J,n j=1 | | i Ij,J,n X X ∈X X ∈X 2J 1 1 wi ≤ n Ij,J,n | | j=1 i I X | | ∈Xj,J,n n 2J w (7.61) ≤ n2 | i| Xi=1

i ˜ i 2 (i) Application of (7.61) with w = a ( )E[W ( ) ½ i ] yields i n n i n W˜ ( ) >cn {| i n | } n 1 ˜ i 2

E ½ [Wi( ) W˜ ( i ) >c ] n n i n n i=1 {| | } XJ 2 J n 1 1 ˜ i 2 2 1 i ˜ 2 E[W ( ) ½ i ]+ a ( ) sup W (u) (7.62). J i W˜ i( ) >cn n 0 2 ≤ 2 Ij,J,n n {| n | } n · n n · u k k j=1 i I i=1 X | | ∈Xj,J,n X By Lemma 7.6(ii),

2J 1 1 i ˜ i 2 a ( ) E[W ( ) ½ i ] J n i W˜ i( ) >cn 2 Ij,J,n | n | · n n j=1 i I {| | } X | | ∈Xj,J,n 2J 1 1 i ˜ j 2 a ( ) E[W ( ) ½ j ] J n 0 J W˜ 0( ) >cn ≤ 2 Ij,J,n | n | · 2 2J j=1 i I {| | } X | | ∈Xj,J,n 2J 1 1 i ˜ i ˜ j 2 + J an( ) W0( ) W0( J ) 2 2 Ij,J,n | n | · n − 2 j=1 i I X | | ∈Xj,J,n 2J ˜ j 2 J ς 1 1 i sup E[W ( ) ½ j ]+ C (2− ) a ( )(7.63). 0 J W˜ 0( ) >cn W J n ≤ J 2 2J · 2 Ij,J,n | n | j=1,...,2 {| | } j=1 i I h i X | | ∈Xj,J,n By (7.60), 2J n 1 1 i 2 i J an( ) an( ) . 2 Ij,J,n | n |≤ n | n | j=1 i I i=1 X | | ∈Xj,J,n X By the dominated convergence theorem,

˜ j 2 lim sup E[W ( ) ½ j ]. 0 J W˜ 0( ) >cn n 2 {| 2J | } →∞ 2J ˜ 2 Furthermore, lim supn n supu W0(u) 2 = 0. Inserting (7.63) into (7.62) and applying →∞ · k k lim supn and afterwards, lim supJ , yields the assertion. →∞ →∞ (ii) Since (7.59) also holds for W˜ (u) replaced by W˜ (u) EW˜ (u), we may assume in the following 0 0 − 0 that w.l.o.g. that EW˜ 0(u) = 0.

51 i i By (7.61) applied to wi = a( n )Wi( n ), we obtain

n 2J 1 i ˜ i 1 1 i ˜ i an( )Wi( ) J an( )Wi( ) n n n − 2 Ij,J,n n n 1 i=1 j=1 i I X X | | ∈Xj,J,n n 2J 1 i an( ) sup W0(u) 1 0 (n ). (7.64) ≤ n · n | n | · u k k → →∞ Xi=1 We furthermore have

2J 1 1 i ˜ i J an( )Wi( ) 2 Ij,J,n n n j=1 | | i Ij,J,n X ∈X 2J 1 1 i ˜ j 1 J an( )Wi( −J ) −2 Ij,J,n n 2 1 j=1 | | i Ij,J,n X ∈X 2J 1 1 i ˜ i ˜ j 1 J an( ) W0( ) W0( −J ) 1 ≤ 2 Ij,J,n | n | · n − 2 j=1 i I X | | ∈Xj,J,n n 2 i J ς a ( ) C (2− ) . (7.65) ≤ n | n n | · W Xi=1 J j 1 Fix j 1, ..., 2 . Put u := − and, for a real-valued positive x, define [x] := max k N : ∈ { } j 2J { ∈ k>x . By stationarity, the following equality holds in distribution: }

Ij,J,n 1 i d 1 | | i [ujn] 1 an( )W˜ i(uj) = an( + − )W˜ i(uj). (7.66) Ij,J,n n Ij,J,n n n i I i=1 | | ∈Xj,J,n | | X

˜ ˜ an( ) Put Wi(u)◦ := Wi(u)½ i [uj n] 1 . By partial summation and since · has bounded + − [r ,rn] An { n n ∈ n } variation Ba uniformly in n,

I 1 | j,J,n| i an( + [ujn] 1)W˜ i(uj) Ij,J,n n − | | Xi=1 Ij,J,n 1 i 1 | |− i i + 1 = an( + [ujn] 1) an( + [ujn] 1) W˜ l(uj)◦ Ij,J,n n − − n − | | i=1 l=1 X  X I 1 | j,J,n| + An W˜ l(uj)◦ Ij,J,n · | | Xl=1 i Ba + 1 An sup W˜ l(uj)◦ (7.67) ≤ Ij,J,n · i=1,..., I | | | j,J,n| l=1 X

52 By stationarity, we have

i sup W˜ l(uj)◦ i=1,..., Ij,J,n | | Xl=1 i ( n(v hn) [uj n]+1) i ∧ ⌊ − ⌋− d = sup W˜ l(uj) = sup W˜ l(uj) , i=1,..., Ij,J,n i=1,...,mn | | l=1 ( n(v+hn) [uj n]+1) l=1 ∨ ⌈ X⌉− X

since ( Ij,J,n ( n(v + hn) [ujn] + 1)) (1 ( n(v hn) [ujn] + 1)) mn := 2nhn. By | |∧ ⌊ 2n ⌋− − ∨ ⌈ − ⌉− ≤ assumption, mn = Anhn . An · →∞ By the ergodic theorem, m 1 lim W˜ l(uj) = 0 a.s. m →∞ m Xl=1 1 m ˜ and especially ( m l=1 Wl(uj))m is bounded a.s. We conclude that

i 1P sup W˜ l(uj) mn i=1,...,mn Xl=1 i i 1 1 1 sup W˜ l(uj) + sup W˜ l(uj) 0. ≤ √mn i=1,...,√m i i=√m +1,...,m i → n l=1 n n l=1 X X We conclude from (7.67) that

I 1 | j,J,n| i an( + [ujn] 1)W˜ i(uj) Ij,J,n n − | | Xi=1 i J mn 1 2 2 (Ba + 1) An sup W˜ l(uj)◦ 0. (7.68) ≤ · · · n · mn i=1,..., I → | j,J,n| l=1 X

Combination of (7.64), (7.65), (7.66) and (7.68) and applying lim supn and afterwards →∞ lim supJ , we obtain →∞ n 1 i i i p a ( ) W˜ ( ) EW˜ ( ) 0. n n n i n − 0 n → i=1 X 

i 1 i If u EW˜ (u) has bounded variation, we have with some intermediate value ξ [ − , ], 7→ 0 i,n ∈ n n n 1 i i 1 an( )EW˜ 0( ) an(u)EW˜ 0(u)du n n n − 0 i=1 Z Xn 1 i i a ( )EW˜ ( ) a (ξ )EW˜ (ξ ) ≤ n n n 0 n − n i,n 0 i,n i=1 X n An 1 i an( ) an(ξi,n) sup W˜ 0(u) 1 ≤ n · A | n − | · u k k n i=1 Xn A i + n EW˜ ( ) EW˜ (ξ ) 0. n 0 n − 0 i,n → i=1 X

53 i 1 i If instead h 0, we have with some intermediate value ξ [ − , ], n → i,n ∈ n n n n 1 i i 1 i a ( )EW˜ ( ) a ( )EW˜ (v) n n n 0 n − n n n 0 i=1 i=1 Xn X 1 i an( ) sup W˜ 0(u) W˜ 0(v) 1 0. ≤ n | n | · u v hn k − k → Xi=1 | − |≤

an( ) Since · has bounded variation uniformly in n, An

n 1 n 1 i An 1 i an( ) an(u)du an( ) an(ξi,n) 0. n n − 0 ≤ n · An | n − |→ i=1 Z i=1 X X

Lemma 7.8. Let satisfy Assumptions 3.1, 3.2 and 2.2. Then there exist constants C > F cont 0,C ¯ > 0 such that for any f , f ∈ F

(i) for any j 1, ≥

Pi jf(Zi, u) 2 Df,n(u)∆(j), k − k ≤ sup f(Zi, u) 2 C∆ Df,n(u), i=1,...,n k k ≤ · ¯ ¯ ˜ sup f(Zi, u) 2 Cf¯, sup f(Z0(v), u) 2 Cf¯. i,u k k ≤ v,u k k ≤

(ii)

i ςs f¯(Z , u) f¯(Z˜ ( ), u) C n− , (7.69) k i − i n k2 ≤ cont · f¯(Z˜ (v ), u ) f¯(Z˜ (v ), u ) C v v ςs + u u ςs . (7.70) k i 1 1 − i 2 2 k2 ≤ cont · | 1 − 2| | 1 − 2|  Proof of Lemma 7.8. (i) If Assumption 2.2 is satisfied, we have by Lemma 7.3 that

(i j) f(Z,u) Pi jf(Zi, u) 2 f(Zi, u) f(Zi∗ − , u) 2 = δ2 (j) Df,n(u)∆(j). k − k ≤ k − k ≤ The second assertion follows from Lemma 7.3. ¯ ˜ ˜ ¯ f(Z0(v),u1) f(Z0(v),u2) ) (ii) Let C := sup | − ς | < (by Assumption 3.2) and R v,u1,u2 u1 u2 2 k | − | k ∞ C := max sup R(Z , u) , sup R(Z˜ (v), u) . Then R { i,u k i k2 u,v k 0 k2} f¯(Z˜ (v), u ) f¯(Z˜ (v), u ) C¯ u u ς . (7.71) k i 1 − i 2 k2 ≤ R| 1 − 2| We then have

¯ ¯ ˜ ˜ s ˜ f(Zi, u) f(Zi(v), u) 2 Zi Zi(v) L ,s (R(Zi, u)+ R(Zi(v), u) 2 k − k ≤ k| − | F k ˜ s ˜ Zi Zi(v) L 2p R(Zi, u) 2p + R(Zi(v), u) 2p ,s p 1 ≤ k| − | F k − k k k k ˜ s 2CR Zi Zi(v) L 2p .  ,s p 1 ≤ k| − | F k −

54 Furthermore,

˜ s ∞ ˜ s Zi Zi(v) L ,s 2p L ,l Xi l Xi l(v) 2p F p¯ 1 F − − p 1 k| − | k − ≤ k| − | k − Xl=0 i ˜ s = L ,l Xi l Xi l(v) 2ps F k − − − k p 1 − Xl=0 i s i ς ς ς s L ,lCX v + l n− ≤ F | − n| l=0 X  i ς ς ∞ ςs v CX L 1 + n− CX L ,ll . ≤ | − n| · | F | · F l=0 X ¯ ςs We obtain with Ccont := 2CR + 2CRCX L 1 + j∞=0 L ,jj that | F | F  P i ςs ςs f¯(Z , u) f¯(Z˜ (v), u) C v + n− . (7.72) k i − i k2 ≤ cont · | − n| h i Furthermore, as above,

˜ ˜ ˜ ˜ s f(Zi(v1), u) f(Zi(v2), u) 2 2CR Z0(v1) Z0(v2) L ,s 2p p 1 k − k ≤ k| − | F k − i ˜ ˜ s 2CR L ,l X0(v1) X0(v2) 2ps ≤ F k − k p 1 l=0 − X ςs 2CRCX L 1 v1 v2 (7.73) ≤ | F | · | − | i From (7.72), we obtain (7.69) with v = n . From (7.71) and (7.73), we conclude (7.70).

7.8 Proofs of Section 5

Proof of Lemma 5.2. Put D (u)= √hK (u v). From (A1) and Assumption 3.1 we obtain that v,n h − ∆(k)= O(δX (k)), C =1+ k max C , 1 2M . 2M R { X } Since K is Lipschitz continuous and (A2) holds, we have

j E j sup θLn,h(v,θ) θLn,h(v,θ) 3 3 v v n , θ θ 2 n ∇ − ∇ | − ′|≤ − | − ′| ≤ − j E j  θLn,h(v′,θ′) θLn,h(v′,θ′) − ∇ − ∇ ∞ CR sup LK v v′ + CΘ θ θ′ 2  3 3 2 ≤ v v n , θ θ 2 n h | − | | − | | − ′|≤ − | − ′| ≤ −  n  1 1+ Z M + E Z M ×n | i|1 | i|1 i=k 1 X  = Op(n− ).

55 Let Θ be a grid approximation of Θ such that for any θ Θ, there exists some θ Θ such n ∈ ′ ∈ n that θ θ n 3. Since Θ RdΘ , it is possible to choose Θ such that Θ = O(n 6dΘ ). | − ′|2 ≤ − ⊂ n | n| − Furthermore, define V := in 3 : i = 1, ..., n as an approximation of [0, 1]. n { − } As in Example 5.1, Corollary 4.3 applied to

′ = f : θ Θ , v V Fj { v,θ ∈ n ∈ n} yields for j 0, 1, 2 that ∈ { } j E j sup θLn,h(v,θ) θLn,h(v,θ) = Op τn . (7.74) v [ h ,1 h ] ∇ − ∇ ∞ ∈ 2 − 2 

Put L˜ (v,θ)= 1 n K (i/n v)ℓ (Z˜ (v)). With (A1) it is easy to see that n,h n i=1 h − θ i PE j L (v,θ) E j L˜ (v,θ) ∇θ n,h − ∇θ n,h j n ∞ d CR Θ K (i/n v) Z Z˜ (v) ≤ n | h − | · k| i − i |1kM i=1 X M 1 M 1 1+ Z − + Z˜ (v) − × k| i|1kM k| i |1kM j M 1 1 d CR K CX (1 + 2C − ) n− + h .  (7.75) ≤ Θ | |∞ X Finally, since K has bounded variation and K(u)du = 1, uniformly in v [ h , 1 h ] it holds that ∈ 2 − 2 n R j 1 j j 1 E L˜ (v,θ)= K (i/n v)E ℓ (Z˜ (v)) = E ℓ (Z˜ (v)) + O((nh)− ). (7.76) ∇θ n,h n h − ∇θ θ 1 ∇θ θ 1 Xi=1 From (7.74), (7.75) and (7.76) we obtain j E j ˜ (j) sup sup θLn,h(v,θ) θℓθ(Z1(v)) = Op(τn ), (7.77) v [ h ,1 h ] θ Θ ∇ − ∇ ∞ ∈ 2 − 2 ∈ where (j) 1 (1) 1 τ := τ + (nh)− + h, j 0, 2 , τ := τ + (nh)− + B . n n ∈ { } n n h

(0) By (A3) and (7.77) for j = 0, we obtain with standard arguments that if τn = o(1),

sup θˆn,h(v) θ0(v) = op(1). v [ h ,1 h ] − ∞ ∈ 2 − 2

Since θˆ (v) is a minimizer of θ L (v,θ) and ℓ is twice continuously differentiable, we have n,h 7→ n,h θ the representation 2 1 θˆ (v) θ (v)= L (v, θ¯ )− L (v,θ (v)), (7.78) n,h − 0 −∇θ n,h v ∇θ n,h 0 where θ¯v Θ fulfills θ¯v θ0(v) θˆn,h(v) θ0(v) = op(1). ∈ | − |∞ ≤ | − |∞ By (A2), we have

2 2 E ℓθ(Z˜0(v)) E ℓθ(Z˜0(v)) = O( θ0(v) θ¯v 2)= op(1). θ θ=θ0(v) θ θ=θ¯v ∇ − ∇ ∞ | − |

56 and thus with (7.77),

2 2 (2) sup Ln,h(v, θ¯v) E ℓθ(Z˜1(v)) = Op(τ )+ op(1). (7.79) θ θ θ=θ0(v) n v [ h ,1 h ] ∇ − ∇ ∞ ∈ 2 − 2

By (A3) and the dominated convergence theorem, E ℓ(Z˜ (v)) = Eℓ(Z˜ (v)) = 0. By (7.77), ∇θ 0 ∇θ 0

sup θLn,h(v,θ0(v)) = sup θLn,h(v,θ0(v)) E θℓ(Z˜0(v)) v [ h ,1 h ] ∇ ∞ v [ h ,1 h ] ∇ − ∇ ∞ ∈ 2 − 2 ∈ 2 − 2 (1) = Op(τn ). (7.80) Inserting (7.79) and (7.80) into (7.78), we obtain ˆ (1) sup θn,h(v) θ0(u) = Op(τn ). v [ h ,1 h ] − ∞ ∈ 2 − 2

This yields an improved version of (7.79):

2 2 (2) sup Ln,h(v, θ¯v) E ℓθ(Z˜1(v)) = Op(τ ). (7.81) θ θ θ=θ0(v) n v [ h ,1 h ] ∇ − ∇ ∞ ∈ 2 − 2

Inserting (7.80) and (7.81) into (7.78), we obtain the assertion.

7.9 Form of the Vn-norm and connected quantities

Lemma 7.9 (Summation of polynomial and geometric decay). Let α> 1 and q N. Then it holds ∈ that

(i) α+1 1 α+1 ∞ α max α, 2− α+1 q− j− { }q− . α 1 ≤ ≤ α 1 − Xj=q − (ii) For σ> 0, κ 1 2 ≥

1 ∞ j 1 b σ log(σ− ) min σ, κ ρ b σ log(σ− e), ρ,κ2,l · · ≤ { 2 } ≤ ρ,κ2 · · ∨ Xj=1 1 ∞ α 1 b σ σ− α min σ, κ j− b σ max σ− α , 1 , α,κ2,l · · ≤ { 2 } ≤ α,κ2 · · { } Xj=1

where bρ,κ2 , bρ,κ2,l, bα,κ2 , bα,κ2,l are constants only depending on ρ, κ2, α.

Proof of Lemma 7.9. (i) Upper bound: If q 2, then ≥ ∞ ∞ j ∞ j α α α ∞ α 1 α+1 ∞ j− = j− dx x− dx = x− dx = x− j 1 ≤ j 1 q 1 α + 1 q 1 Xj=q Xj=q Z − Xj=q Z − Z − − − α+1 1 α+1 1 α+1 q 1 α+1 2− α+1 = (q 1)− = q− ( − )− q− . α 1 − α 1 · q ≤ α 1 − − − 57 α α 1 α+1 α If q = 1, then j∞=q j− =1+ j∞=q+1 j− 1+ α 1 q− = α 1 . ≤ − − Lower bound:P Using similar decompositionP arguments as above, we have

∞ ∞ j+1 α α ∞ α 1 α+1 ∞ 1 α+1 j− x− dx = x− dx = x− = q− . ≥ j q α + 1 q α 1 j=q j=q Z Z − − X X

(ii) • Exponential decay: Upper bound: First let a := max log(σ/κ2) , 0 + 1. Then we have {⌊ log(ρ) ⌋ } a 1 ∞ − ∞ ρa min σ, κ ρj σ + κ ρj = aσ + κ { 2 } ≤ 2 2 1 ρ j=0 j=0 j=a − X X κ X σ σ aσ + 2 min , 1 aσ + ≤ 1 ρ {κ }≤ 1 ρ − 2 − 1 2 σ 1 max log(κ2/σ), 0 + ≤ · log(ρ− ) { } 1 ρ h − i 1 1 log(κ2) 0 2 σ 1 max log(σ− ), 0 + ∨1 + ≤ · log(ρ− ) { } log(ρ− ) 1 ρ h 1 − i b σ log(σ− e), ≤ ρ,κ2 · · ∨ 1 1 2 log(ρ− ) where bρ,κ2 := 2(log(κ2) 1) log(ρ 1) 1+ 1 ρ . ∨ · − − j κ2 q Lower Bound: Put β(q)= κ2 j∞=q ρ = 1 ρ ρ . Then − P ∞ min σ, κ ρj σ(ˆq 1) + β(ˆq), { 2 } ≥ − Xj=1 whereq ˆ = min q N : σ ρq . We haveq ˆ log(σ/κ2) =: q andq ˆ q +1. Thus { ∈ κ2 ≥ } ≥ log(ρ) ≤

∞ min σ, κ ρj σ(q 1) + β(q + 1). { 2 }≥ − Xj=1 Now consider the case σ < ρ2, that is, log(σ/κ2) 2. Then, q 1 1 q, and q 2 log(σ/κ2) . κ2 log(ρ) ≥ − ≥ 2 ≤ log(ρ) We obtain

∞ 1 log(σ/κ2) κ2ρ log(σ/κ2) 1 log(σ/κ2) ρ min σ, κ ρj σ + ρ log(ρ) = σ + σ { 2 } ≥ 2 log(ρ) 1 ρ 2 log(ρ) 1 ρ Xj=1 − − 1 ρ 1 1 + 1 σ log(σ− κ2), ≥ 2 1 ρ log(ρ− )  −  1 ρ 1 that is, the assertion holds with bρ,κ2,l := 2 1 ρ + log(ρ 1) . − − σ 1 σ 1 • Polynomial decay: Upper bound: Let a := ( )− α + 1 ( )− α . Then we have by ⌊ κ2 ⌋ ≥ κ2

58 (i):

a ∞ α ∞ α κ2 α+1 min σ, κ j− σ + κ j− = aσ + a− { 2 } ≤ 2 α 1 Xj=1 Xj=1 j=Xa+1 − 1 α κ2 α 1 aσ + σ α− ≤ α 1 − 1 α 1 1 1 α κ2 σ κ σ− α +1+ σ− α ≤ · 2 α 1 h 1 1 − i α α σ κ σ− α + 1 ≤ · α 1 2 h − 1 i b σ max σ− α , 1 , ≤ α,κ2 · · { } 1 α α where bα,κ2 := 2 α 1 (κ2 1) . − ∨ α κ2 α+1 Lower Bound: Put β(q)= κ2 j∞=q j− . By (i), β(q) α 1 q− . Then ≥ − P ∞ α min σ, κ2j− min σq + β(q) { } ≥ q N { } ∈ Xj=1 κ2 α+1 min σq + q− . ≥ q N { α 1 } ∈ − 1 1 1 α κ2 Elementary analysis yields that the minimum is achieved for q = κ σ− a = ( ) α , that 2 · σ is, ∞ 1 α 1 α α α − min σ, κ j− κ σ α , { 2 }≥ α 1 2 · Xj=1 − 1 α α the assertion holds with bα,κ2,l := α 1 κ2 . −

α Lemma 7.10 (Values of q∗, r(δ)). • Polynomial decay ∆(j)= κj− (α> 1). Then there exist (i) (i) constants cα,κ,Cα,κ > 0, i = 1, 2 only depending on κ, α such that

(1) 1 (1) 1 c max x− α , 1 q∗(x) C max x− α , 1 , α,κ { }≤ ≤ α,κ { } and (2) α (2) α c min δ α 1 , δ r(δ) C min δ α 1 , δ . α,κ { − }≤ ≤ α,κ { − } j (i) (i) • Geometric decay ∆(j) = κρ (ρ (0, 1)). Then there exist constants cρ,κ,Cρ,κ > 0, i = 1, 2 ∈ only depending on κ, ρ such that

(1) 1 (1) 1 c max log(x− ), 1 q∗(x) C max log(x− ), 1 , ρ,κ { }≤ ≤ ρ,κ { } and δ δ c(2) r(δ) C(2) . ρ,κ log(δ 1 e) ≤ ≤ ρ,κ log(δ 1 e) − ∨ − ∨

59 β(q) α α Proof of Lemma 7.10. (i) By Lemma 7.9(i), βnorm(q) = q [cα,κq− ,Cα,κq− ] with cα,κ = α+1 ∈ κ max α,2− α 1 , Cα,κ = κ {α 1 } . In the following we assume w.l.o.g. that Cα,κ > 1 and cα,κ < 1. − −

• q∗(x) Upper bound: For any x> 0,

x 1 x 1 q∗(x) = min q N : βnorm(q) x min q N : q ( )− α = ( )− α . { ∈ ≤ }≤ { ∈ ≥ Cα,κ } ⌈ Cα,κ ⌉

1 1 1 x α Especially we obtain q∗(x) ( )− α + 1 2Cα,κ max x− α , 1 . The assertion holds ≤ Cα,κ ≤ { } (1) 1 with Cα,κ := 2 max C , 1 α . { α,κ } • q∗(x) Lower bound: Similarly to above,

1 1 1 1 x x α q∗(x) ( )− α − α = cα,κx− α . ≥ ⌈ cα,κ ⌉≥ cα,κ  1 1 α (1) On the other hand, q (x) 1 cα,κ, which yields the assertion with cα,κ = min c , 1 α . ∗ ≥ ≥ { α,κ } α 1 α • α 1 − α 1 α 1 r(δ) Upper bound: Put r = 2 − cα,κ− δ − . Then we have

1 1 1 α 1 1 α r α 1 α 1 α α 1 − α 1 α 1 α 1 q∗(r)r ( )− r = 2 − cα,κ− 2− − cα,κ− δ− − δ − 2δ > δ. ≥ ⌈ cα,κ ⌉ ⌈ ⌉ ≥ By definition of r( ), r(δ) r. It was already shown in Lemma 7.5 that r(δ) δ holds · ≤ 1 ≤ (2) α α 1 − α 1 for all δ > 0. We obtain the assertion with Cα,κ = 2 − cα,κ− .

• r(δ) Lower bound: First consider the case δ 1, x 2x and thus − − − − ⌈ ⌉≤ 1 1 1 α 1 1 α r α 1 α 1 α α 1 − α 1 α 1 α 1 1 q∗(r)r ( )− r = 2− − Cα,κ− 2 − Cα,κ− δ− − δ − 2 2− δ δ. ≤ ⌈ Cα,κ ⌉ ⌈ ⌉ ≤ · ≤

α δ 1 By definition of r( ), r(δ) r = 2− α 1 min ( ) α 1 , 1 δ. · ≥ − { Cα,κ − } In the case δ>Cα,κ, we have

δ 1 q∗(δ)δ = ( )− α δ 1 δ δ, ⌈ Cα,κ ⌉ ≤ · ≤

δ 1 α δ 1 thus r(δ) δ = min ( ) α 1 , 1 δ 2− α 1 min ( ) α 1 , 1 δ. We conclude that the Cα,κ − − Cα,κ − ≥ { } ≥ 1 { } (2) α α 1 − α 1 assertion holds with cα,κ = 2− − Cα,κ− .

β(q) ρq κρ (ii) We have βnorm(q) = q = Cρ,κ q , where Cρ,κ = 1 ρ . In the following we assume w.l.o.g. − that Cρ,κ > 8.

x ψ( 1 ) • 1 Cρ,κ log(ρ− ) q∗(x) Upper bound: Put ψ(x) = max log(x− ), 1 . Defineq ˜ = 1 . Then { } ⌈ log(ρ− ) ⌉ we have

1 x − 1 log( 1 )/ log(ρ− ) x Cρ,κ log(ρ− ) 1 ρ log(ρ− ) x βnorm(˜q) Cρ,κ x x, ≤ q˜  ≤ q˜ ≤ ψ( 1 ) ≤ Cρ,κ log(ρ− )

60 thus x ψ( 1 ) N Cρ,κ log(ρ− ) q∗(x) = min q : βnorm(q) x q˜ = 1 . { ∈ ≤ }≤ log(ρ− ) l m Especially we obtain

1 1 2(1 + log(Cρ,κ log(ρ− ))) q∗(x) 1 ψ(x) + log(Cρ,κ log(ρ 1)) + 1 1 ψ(x), ≤ log(ρ− ) − ≤ log(ρ− )  (1) 2(1+log(Cρ,κ log(ρ 1))) that is, the assertion holds with Cρ,κ = 1 − . log(ρ− ) x 1 log(( 1 )− ) • 1 4 1 Cρ,κ log(ρ− ) q∗(x) Lower Bound: Case 1: Assume that xx norm ρ,k ρ,κ − x 1/2 ≥  q˜  ≥ log(( 1 ) ) Cρ,κ log(ρ− ) − since x 1/2 x 1/2 ( 1 )− > log(( 1 )− ). Cρ,κ log(ρ− ) Cρ,κ log(ρ− ) 1 4 We have therefore shown that for x

q∗(x) q˜ = max 1, q˜ . (7.82) ≥ { } Case 2: If x C log(ρ 1)ρ4, thenq ˜ 1, that is, ≥ ρ,κ − ≤

q∗(x) 1 = max 1, q˜ . ≥ { } We have shown that for all x> 0,

q∗(x) max 1, q˜ . ≥ { } Since

x 1 log(( 1 )− ) 1 Cρ,κ log(ρ− ) 1 1 1 q˜ 1 1 log(x− ) + log(Cρ,κ log(ρ− )) ≥ 4 log(ρ− ) ≥ 4 log(ρ− ) 1 1   1 log(x− ), ≥ 4 log(ρ− )

(1) 1 the assertion follows with cρ,κ = 1 . 4 log(ρ− ) (1) 1 • r(δ) Upper bound: Putr ˜ = 2(cρ,κ)− δ . Then we have log((2 1c(1) δ 1) e) − ρ,κ − ∨ (1) 1 q∗(˜r)˜r c log(˜r− e) r˜ ≥ ρ,κ ∨ · 2δ 1 (1) 1 1 (1) 1 = log([2− cρ,κδ− log((2− cρ,κδ− ) e)] e) 1 (1) 1 log((2 cρ,κδ ) e) · ∨ ∨ − − ∨ 2δ 1 (1) 1 log([2− cρ,κδ− ] e) = 2δ > δ. 1 (1) 1 ≥ log((2 cρ,κδ ) e) · ∨ − − ∨ 61 By definition of r( ), we obtain · r(δ) r.˜ ≤ 1 log(x− e) For a (0, 1), the function (0, ) (0, ),x 1∨ attains its maximum at log((ax− ) e) ∈1 ∞ → ∞1 7→ ∨ x = ae− with maximum value 1 + log(a− ). Thus

(1) 1 1 (1) 1 δ r˜ 2(c )− (1 + log(2− (c )− )) , ≤ ρ,κ ρ,κ · log(δ 1 e) − ∨ (2) (1) 1 1 (1) 1 that is, the assertion holds with Cρ,κ = 2(cρ,κ)− (1 + log(2− (cρ,κ)− )). 1 (1) 1 • r(δ) Lower Bound: Putr ˜ = 2− (Cρ,κ)− δ . Then log((2C(1) δ 1) e) ρ,κ − ∨ (1) 1 q∗(˜r)˜r C log(˜r− e) r˜ ≤ ρ,κ ∨ · 1 2− δ (1) 1 (1) 1 = log([2Cρ,κδ− log((2Cρ,κδ− ) e)] e) (1) 1 · ∨ ∨ log((2Cρ,κδ− ) e) 1 ∨ 2− δ (1) 1 (1) 1 log((2Cρ,κδ− ) e) + log log((2Cρ,κδ− ) e) (1) 1 ≤ log((Cρ,κδ ) e) · ∨ ∨ − ∨ δ,   ≤ where the last step is due to log(x) + log log(x) 2 log(x) for x e. By definition of ≤ ≥ r( ), we obtain · r(δ) r.˜ ≥ 1 log(x− e) For a > 1, the function (0, ) (0, ),x 1∨ attains its minimum at ∞ → ∞ 7→ log((ax− ) e) 1 1 ∨ x = e− with minimum value 1+log(a) . We therefore obtain

(C(1)) 1 δ r˜ ρ,κ − , ≥ (1) log(δ 1 e) 2(1 + log(2Cρ,κ)) − ∨

(1) 1 (2) (Cρ,κ)− that is, the assertion holds with cρ,κ = (1) . 2(1+log(2Cρ,κ))

α Lemma 7.11 (Form of Vn). (i) Polynomial decay ∆(j)= κj− (where α> 1): Then there exist (3) (3) some constants Cα,κ, cα,κ only depending on κ, α, Dn such that

1 1 c(3) f max f − α , 1 V (f) C(3) f max f − α , 1 . α,κk k2,n {k k2,n }≤ n ≤ α,κk k2,n {k k2,n }

j (3) (3) (ii) Geometric decay ∆(j) = κρ (where ρ (0, 1)): Then there exist some constants cρ,κ,Cρ,κ ∈ only depending on κ, ρ, Dn such that (3) 1 (3) 1 c f max log( f − ), 1 V (f) C f max log( f − ), 1 . ρ,κk k2,n { k k2,n }≤ n ≤ ρ,κk k2,n { k k2,n }

Proof of Lemma 7.11. The assertions follow from Lemma 7.9(ii) by taking κ2 = κDn. The maxi- mum in the lower bounds is obtained due to the additional summand f in V (f). k k2,n n

62 The following lemma formulates the entropy integral in terms of the well-known bracketing numbers D in terms of the 2,n-norm in the case that supn N n < . For this, we use the upper bounds k · k ∈ ∞ of Vn given in Lemma 7.11.

α (3) Lemma 7.12. (i) Polynomial decay ∆(j)= κj (where α> 1). Then for any σ (0,Cα,κ), − ∈ α σ α 1 σ ( (3) ) − α 1 C 1 H (3) α,κ α H (ε, ,Vn)dε Cα,κ − u− (u, , 2,n)du, 0 F ≤ α 0 F k · k Z p Z q (3) where Cα,κ is from lemma 7.11.

j 1 (3) (ii) Exponential decay ∆(j)= κρ (where ρ (0, 1)). Then for any σ (0, e Cρ,κ), ∈ ∈ − σ σ E−( (3) ) H (3) Cρ,κ 1 H (ε, ,Vn)dε Cρ,κ log(u− ) (u, , 2,n)du, 0 F ≤ 0 F k · k Z p Z q x (3) where E−(x)= 1 and Cρ,κ is from lemma 7.11. log(x− )

(3) 1 − α Proof of Lemma 7.12. (i) By Lemma 7.11, Vn(f) Cα,κ f 2,n max f 2,n , 1 . We abbreviate (3) ≤ k k {k k } c = Cα,κ in the following. α Let ε (0, c) and (l , u ), j = 1, ..., N brackets such that u l ( ε ) α 1 . Then ∈ j j k j − jk2,n ≤ c − α 1 − ε α ε ε V (u l ) c max u l , u l α c max ( ) α 1 , c = ε. n j − j ≤ {k j − jk2,n k j − jk2,n }≤ c − c ≤ · c n o Therefore, the bracketing number fulfill the relation

ε α N(ε, ,V ) N ( ) α 1 , , . F n ≤ c − F k · k2,n   We conclude that for σ (0, c), ∈ σ σ ε α H(ε, ,V )dε H ( ) α 1 , , dε F n ≤ c − F k · k2,n 0 0 r Z Z  α  p ( σ ) α 1 α 1 c − 1 = c − u− α H(u, , 2,n)du. α 0 F k · k Z q α 1 ε α 1 du α 1 ε α 1 In the last step, we used the substitution u = ( c ) − which leads to dε = α 1 c ( c ) − = 1 − · · α 1 α α 1 c u . − · · (3) 1 (ii) By Lemma 7.11, Vn(f) Cρ,κE( f 2,n) with E(x) = x max log(x− ), 1 . We abbreviate (3) ≤ k k { } c = Cρ,κ in the following. x 1 We first collect some properties of E. Put E−(x)= log(x 1 e) . In the case x > e− , we have − ∨ E(E (x)) = x. In the case x e 1, we have − ≤ − 1 x x− x 1 E(E−(x)) = 1 log 1 1 1 log(x− )= x. log(x− ) · log(x− )− ≤ log(x− )  

63 This shows that for all x> 0, E(E−(x)) x. (7.83) ≤ 1 Furthermore, for x < e− ,

1 1 1 1 log(E−(x)− ) = log(x− log(x− )) log(x− ). (7.84) ≥ Now let ε (0, 1) and (l , u ), j = 1, ..., N brackets such that u l E ( ε ). Then by ∈ j j k j − jk2,n ≤ − c (7.83), ε ε V (u l ) cE(E−( )) c = ε. n j − j ≤ c ≤ · c Therefore, we have the following relation between the bracketing numbers ε N(ε, ,V ) N E−( ), , . F n ≤ c F k · k2,n   We conclude that for σ (0, ce 1), ∈ − σ σ σ E−( ) ε c 1 H(ε, ,V )dε H E ( ), , dε c log(u− ) H(u, , )du. F n ≤ − c F k · k2 ≤ F k · k2 0 0 r 0 Z p Z   Z p 1 ε du 1 1+log((ε/c)− ) In the last step, we used the substitution u = E−( ) which leads to = 1 2 , c dε c · log((ε/c)− ) and with (7.84) we obtain

1 2 log((ε/c)− ) 1 ε 1 1 dε = c 1 du c log((ε/c)− )du c log(E−( )− )du = c log(u− )du. 1 + log((ε/c)− ) ≤ ≤ c

64