Nonparametric density estimation in compound Poisson process using power estimators. Fabienne Comte, Céline Duval, Valentine Genon-Catalot

To cite this version:

Fabienne Comte, Céline Duval, Valentine Genon-Catalot. Nonparametric density estimation in com- pound Poisson process using convolution power estimators.. Metrika, Springer Verlag, 2014, 77 (1), pp.163-183. ￿10.1007/s00184-013-0475-3￿. ￿hal-00780300￿

HAL Id: hal-00780300 https://hal.archives-ouvertes.fr/hal-00780300 Submitted on 23 Jan 2013

HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. NONPARAMETRIC DENSITY ESTIMATION IN COMPOUND POISSON PROCESS USING CONVOLUTION POWER ESTIMATORS.

F. COMTE1, C. DUVAL2, AND V. GENON-CATALOT1

Abstract. Consider a compound Poisson process which is discretely observed with sampling interval ∆ until exactly n nonzero increments are obtained. The jump density and the intensity of the Poisson process are unknown. In this paper, we build and study parametric estimators of appropriate functions of the intensity, and an adaptive nonparametric estimator of the jump size density. The latter estimation method relies on nonparametric estimators of m-th convolution powers density. The L2-risk of the adaptive estimator achieves the optimal rate in the minimax sense over Sobolev balls. Numerical simulation results on various jump densities enlight the good performances of the proposed estimator.

Keywords. Convolution. Compound Poisson process. Inverse problem. Nonparametric estima- tion. Parameter estimation. AMS Classification. 62G07 - 60G51 - 62F12.

1. Introduction Compound Poisson processes are widely used in practice especially in queuing and insurance theory (see e.g. Embrechts et al., 1997 and refererences therein, Katz (2002) or Scalas (2006)). Let (X t 0) be a compound Poisson process, given by t ≥ Nt (1) Xt = ξj i=1 where (ξ j 1) is a sequence of i.i.d. real valued random variables with density f,(N ) j ≥ t is a Poisson process with intensity c > 0, independent of the sequence (ξj j 1). The density f and the intensity c are unknown. In this paper, we are interested in≥ adaptive nonparametric estimation of f from discrete observations (Xj∆ j 0) of the sample path with sampling interval ∆. ≥ Compound Poisson processes have independent and stationary increments. They are a special case of L´evy processes with integrable L´evy density equal to cf(). It is therefore natural to base the estimation procedure for f on the i.i.d. increments (Xj∆ X(j 1)∆ j − − ≥ 1). If c is known, the nonparametric estimation of f is equivalent to the nonparametric estimation of the L´evy density of a pure jump L´evy process with integrable L´evy measure. Several papers on the subject are available, see Basawa and Brockwell (1982), Buchmann

1 Universit´eParis Descartes, MAP5, UMR CNRS 8145. emails: [email protected] and [email protected]. 2 CREST-ENSAE and CEREMADE, Universit´eParis Dauphine. email: [email protected]. 1 2 F.COMTE,C.DUVAL,ANDV.GENON-CATALOT

(2009), Chen et al. (2010), Comte and Genon-Catalot (2009, 2010, 2011), Figueroa-L´opez and Houdr´e(2006), Figueroa-L´opez (2009), Gugushvili (2009, 2012), Jongbloed et al. (2005), Kim (1999), Neumann and Reiss (2009), Ueltzh¨ofer and Kl¨uppelberg (2011), Zhao and Wu (2009). However, specific methods for compound Poisson processes have been investigated, for instance Buchmann and Gr¨ubel (2003) first introduced decompounding methods to esti- mate discrete compound densities. Indeed, the common distribution of the increments is equal to c∆ c∆ (2) P (dx) = e− δ (dx) + (1 e− )g (x)dx X∆ 0 − ∆ where δ is the Dirac mass at 0, g is the conditional density of X given that X = 0: 0 ∆ ∆ ∆ e c∆ (c∆)m (3) g = − f ⋆ m ∆ 1 e c∆ m! m 1 − ≥ − and f ⋆ m denotes the m-th convolution power of f. Thus, null increments provide no information on the density f. Relying on this fact, van Es et al. (2007) assume that the sample path Xt is discretely observed until exactly n increments are nonzero. Such observations can be described as follows. Let

(4) S1 = inf j 1 Xj∆ X(j 1)∆ = 0 Si = inf j > Si 1 Xj∆ X(j 1)∆ = 0 i 2 { ≥ − − } { − − − } ≥ and let

(5) Zi = XSi∆ X(S 1)∆ − i− Assume that the Xj∆’s are observed for j Sn. Thus, (Si Zi) i = 1 n are observed and Z Z is a n-sample of the conditional≤ distribution of X given that X = 0 1 n ∆ ∆ which has density g∆ (see Proposition 2.1). The problem then is to deduce an estimator of f from an i.i.d. sample of g∆. Under the assumption that the intensity c is known and for ∆ = 1 (low frequency data), van Es et al. (2007) build a nonparametric kernel estimator of f exploiting the relationship between the characteristic of f and the characteristic function of g∆. In Duval (2012a), a different estimation method is considered. Duval (2012a) remarks that the operator f g∆ := P∆f can be explicitly inverted, actually using the relationship → 1 pointed out by van Es et al. (2007). So, f = P∆− g∆. Provided that c∆ < log 2, the inverse 1 operator P∆− admits a series development given by (see Duval, 2012a, chap. 3, Lemma 1): m+1 c∆ m 1 ( 1) (e 1) ⋆m (6) g P − (g) = − − g → ∆ m c∆ m 1 ≥ Consequently, truncating the above development and keeping K + 1 terms, an approxi- mation of the inverse operator is obtained which suggests to approximate f by: K+1 ( 1)m+1 (ec∆ 1)m (7) f − − g⋆m ≃ m c∆ ∆ m=1 ESTIMATIONFORCOMPOUNDPOISSONPROCESS 3

The approximation is valid for small ∆. Afterwards, an estimator of f is built replacing, c∆ m ⋆m for m = 1 K +1, (e 1) c∆ by a consistent estimator and g∆ by a nonparametric − ⋆m estimator based on the observations (Zj j = 1 n). This is not quite simple as g∆ ⋆m is the density of the sum Z1 + Zm. The estimator proposed by Duval for g∆ is a wavelet threshold estimator using data composed by independent sums of m observations assuming that a deterministic number nT of increments are observed with sampling interval p ∆T and total length time of observation T = nT ∆T . The rate of L -risk of the resulting estimator of f is measured in terms of T for ∆T tending to 0 while T tends to infinity. The usual optimal rate on Besov balls is obtained up to logarithmic factors provided that 2K+2 T ∆T = O(1). In Comte and Genon-Catalot (2009), the adaptive estimator of the L´evy 2 density reaches the same rate provided that T ∆T = O(1) (without logarithmic loss and for the L2-risk only). As soon as K 1, Duval’s estimator of f improves the result of Comte and Genon-Catalot (2009), in the≥ case of compound Poisson processes. In Kessler (1997) a similar strategy of adding correction terms to improve parametric estimators for diffusion models is also adopted. ⋆m Nevertheless, estimating g∆ by building sums of m variables from the sample (Z1 Zn) is heavy and numerically costly. In this paper, we build a nonparametric estimator of f relying on the approximation (7). In our approach, the difference lies in the estimation ⋆m method of g∆ . To simplify notations, we omit the dependence on ∆ for g∆ and set ⋆m ⋆m (8) g := g∆ g := g∆ It is well known that, from a n-sample of a density g, √n-consistent nonparametric es- timators of the convolution power g⋆m, for m 2, can be built (see e.g. Schick and Wefelmeyer, 2004). In a recent paper, Chesneau≥ et al. (2013), propose a very simple √n-consistent estimator of the m-th convolution power g⋆m of a density g from n i.i.d. random variables with density g. Of course, m 2 is fixed and should not be too large. This is the point of view adopted here. ≥ m Let g∗ denote the of the density g. As (g∗) is the Fourier transform ⋆m m of g , Chesneau et al. (2013) propose to estimate (g∗) for all m 1, by the empirical m ≥ counterpart (˜g∗(t)) with: n 1 itZ (9)g ˜∗(t) = e j n j=1 leading by Fourier inversion to the estimator with cutoff ℓ, πℓ ⋆m 1 itx m (10) gℓ (x) = e− (˜g∗(t)) dt 2π πℓ − Afterwards, we define: K+1 ( 1)m+1 (ec∆ 1)m (11) f (x) = − c (∆)g⋆m(x) with c (∆) = − Kℓ m m ℓ m c∆ m=1 As c is unknown, this is not an estimator of f. To get an estimator fKℓ(x) of f, we replace, for all m and ∆, cm(∆) by an estimator cm(∆) defined below. We study for fixed 4 F.COMTE,C.DUVAL,ANDV.GENON-CATALOT

2 ℓ the L -risk of fKℓ and propose an adaptive (data-driven) choice ℓˆ of ℓ. We prove that 2 the L -risk of the adaptive estimator fKℓˆ attains the usual optimal rate on Sobolev balls. Moreover, the risk bounds are non asymptotic and the contribution of terms coming from the estimation of g⋆m for m 2 is of order O(1n). Note that the total length time of ≥ observation is, in our framework, equal to Sn∆. As n tends to infinity and ∆ tends to 0, this random value is asymptotically equivalent to n. Hence, the benchmark for evaluating rates is in terms of (negative) powers of n. Note that, compared to Duval (2012a), we have no logarithmic loss in our rate, which is optimal. Indeed, the lower bound is available and our adaptive estimator is thus minimax from an asymptotic point of view. 2 In Section 2, we define the estimators of cm(∆) m 1 and give a bound for their L -risk in terms of n and ∆. In Section 3, results from Chesneau≥ et al. (2013) on nonparametric estimation of m-th convolution powers of a density are recalled. Section 4 concerns the estimation of f. Our main result (Theorem 4.1) gives the L2-risk of the adaptive estimator of f. In Section 5, the estimation method is illustrated on simulated data for various jump densities. It shows that the adaptive estimator performs well for small values of K. Section 6 gives some concluding remarks. Proofs are gathered in Section 7 and Appendix.

2. Preliminary results. Consider a compound Poisson process given by (1) and ∆ a sampling interval. Then we can prove the following result. Proposition 2.1. Let S = 0 and S Z i 1 be given by (4)-(5). We have, for all i 1, 0 i i ≥ ≥ P(Si < + ) = 1, (Si Si 1 Zi) i 1 are independent and identically distributed random couples. For∞ k 1, − − ≥ ≥ c(k 1)∆ c∆ P(S = k Z x) = e− − (1 e− )P(X x X = 0) 1 1 ≤ − ∆ ≤ | ∆ Consequently, S1 and Z1 are independent, the distribution of Z1 is equal to the condi- tional distribution of X∆ given X∆ = 0, S1 has geometric distribution with parameter c∆ 1 e− . Moreover, the random variables (S1 Z1 Si Si 1 Zi Sn Sn 1 Zn) are− independent. − − − −

Let us now study the estimation of cm(∆). For this, we use (S1 Sn) which are independent of the sample (Z1 Zn).

Proposition 2.2. Assume that c [c0 c1] with c0 > 0 and c1∆ log(2)2. For m 1, let ∈ ≤ ≥ 1 (12) Hm(ξ) = m ξ (ξ 1) log ξ 1 − − and define (13) cm(∆) = Hm(Snn) 1 1+ 1 Sn 1+ 1  e2c1∆ 1 ≤ n ≤ ec0(2∆) 1  − − Then, 2(m 1) 2 ∆ − (14) E c (∆) c (∆) C m − m ≤ m n ESTIMATIONFORCOMPOUNDPOISSONPROCESS 5 where Cm has an explicit expression as a function of c0 c1 and m. We remark that the indicator in the definition of cm(∆) implies that the estimator is set to zero on the complement of the set 1 + 1 Sn 1 + 1 , but it is e2c1∆ 1 n ec0(2∆) 1 − ≤ ≤ − shown in the proof of Proposition 2.2 that this complement has small probability. Note that the bound in (14) is non asymptotic and the exact value of Cm can be deduced from the proof of Proposition 2.2.

3. Estimation of the m-th convolution power g⋆m of a density g from a n-sample of g. We recall now results proved in Chesneau et al. (2013). An important point is that estimators of the m-th convolution power g⋆m with L2-risk of order 1n can be built. Consider an i.i.d. sample of variables Z1 Zn with density g and characteristic function g∗, the Fourier transform of g. Using the standard estimatorg ˜∗ of g∗ defined by (9), Chesneau et al. (2013) propose the estimator of g⋆m given by (10). The following bounds for this estimator are proved in Chesneau et al. (2013): Proposition 3.1. For m 2 and all t, ≥ 2 m m 2 1 g∗(t) (15) E( (g ) (t) (g∗) (t) ) + | | | ∗ − | ≤ Em nm n where m is a constant which does not depend on n nor on g, increasing with m and m E m (g∗) (t) = (˜g∗(t)) (see (9)). Consequently, 2 E ⋆m ⋆m 2 1 ⋆m 2 ℓ g ( gℓ g ) (g )∗(t) dt + m m + − ≤ 2π t πℓ | | E n n | |≥ Let us introduce the Sobolev ball

2 α 2 (α R) = f L (R) L (R) (1 + x ) f ∗(x) dx R S { ∈ 1 ∩ 2 | | ≤ } If g⋆m belongs to (α R ), the L2-risk bound becomes S m m 2 ⋆m ⋆m 2 2αm ℓ g E( g g ) R ℓ− + + ℓ − ≤ m Em nm n m(2αm+1) ⋆m 2 Choosing a trade-off bandwidth ℓopt = Cn , we get a risk bound on E( g g ) ℓopt − 2mαm(2αm+1) 1 of order max(n− n− ). This allows to obtain a rate of order 1n whenever 2mαm(2αm + 1) 1 i.e. 2αm(m 1) 1. This occurs for instance if m 2 and α 12. ≥ − ≥ ≥ m ≥ 4. Estimation of f. ⋆m Let us first give the links between Sobolev regularities of f, g and g with g = g∆. Below, for any function h L (R) L (R), we denote by h the function defined by ∈ 1 ∩ 2 ℓ hℓ∗ = h∗1[ πℓπℓ]. − 6 F.COMTE,C.DUVAL,ANDV.GENON-CATALOT

Proposition 4.1. Let the density f belong to (α R). Then g defined by (3) and (8) belongs to (α R) and g⋆m (mα R ) for someS constant R . In particular, S ∈ S m m g f ≤ We assume now that c [c0 c1] with c1∆ log 22 and consider the estimator fKℓ given by ∈ ≤ K+1 ( 1)m+1 (16) f (x) = − c (∆)g⋆m(x) Kℓ m m ℓ m=1 where cm(∆) is defined in (11) and cm(∆) is the estimator of cm(∆) given in (13). Proposition 4.2. Assume that c [c c ] with c > 0 and c ∆ log 22. Then the ∈ 0 1 0 1 ≤ estimator fKℓ is such that E 2 5 2 10ℓ 2K+2 5BK (17) ( fKℓ f ) f ∗(t) dt + + 5AK ∆ + − ≤ 2π t πℓ | | n n | |≥ with f 2 (18) A = 6 (√2c)2K+2 K (K + 2)2 (19) K+1 m 2(m 1) 2 2 (Cm + 2 c − ) m 2(m 2) 2 B = 2(K + 1) C (1 + f ) + ∆ E ∆ − (1 + 2 f ) K 1 m2 m=2 where C are the constants appearing respectively in (14) and in (15). m Em If f (α R), choosing ℓ = ℓ n 1(2α+1), inequality (17) yields ∈ S ∗ ∝ − 2 2α(2α+1) 2K+2 (20) E( f f ) Cn− + 5A ∆ Kℓ∗ − ≤ K Usually, in high frequency data for continuous time models, rates are measured in terms of the total length time of observation which is, in our framework, equal to Sn∆. Evaluating this random value as n tends to infinity, ∆ tends to 0, we get that S ∆ n S ∆ = n n∆ n n n ∼ p(∆) ∼ c The total length time of observation is asymptotically equivalent to n. Hence, the rate in (20) is exactly the one obtained by Duval (2012a), with no logarithmic loss. Now, we aim at obtaining the choice of ℓ in an automatic and nonasymptotic way. For this, we propose an adaptive selection procedure. More precisely, let

2 ℓ ℓˆ= arg min fKℓ + pen(ℓ) with pen(ℓ) = κ ℓ 12Ln − n ∈{ } We can prove the following result. ESTIMATIONFORCOMPOUNDPOISSONPROCESS 7

Theorem 4.1. Assume that f is bounded and L n. There exists a value κ such that n ≤ 0 for any κ larger than κ0, we get, E 2 2 2K+2 BK C′ (21) ( fKℓˆ f ) 4 min f fℓ + pen(ℓ) + 32AK ∆ + 32 + − ≤ 1 ℓ Ln − n n ≤ ≤ where C′ is a constant. Comparing the above inequality with (17), we see that the adaptive estimator auto- matically realizes the best compromise between the squared bias term (first one, inside the min) and the variance term (second one, inside the min). The last two terms are 2K+2 standardly negligible. For the term 32AK ∆ , either the sampling interval ∆ for given K is tuned to make it negligible (O(1n)) or n, ∆ are given and K is chosen so that n∆2K+2 1. We now≃ recall a lower bound derived in Duval (2012a). This lower bound is obtained in the super experiment where the compound Poisson process (X t 0) is continuously t ≥ observed over [0 Sn∆]. In that super experiment we observe (at least) n independent realizations of f and we obtain a lower bound applying classical results (see e.g. Tsybakov 2009). Proposition 4.3. We have (22) lim inf sup n2α(2α+1) E( f f 2) > 0 n f f S(αR) − →∞ ∈ where the infimum is taken overb all estimators based on the observations (X t S ∆). t ≤ n 2α(2α+1) The above inequality shows that the estimator is minimax whenever n− is larger than ∆2K+2. Since 2α(2α + 1) 1, we take K such that n∆2K+2 1. ≤ ≃ 5. Simulations In this section we illustrate the method on simulated data. We have implemented the adaptive estimator on different examples of jump densities f, namely, (1) A Gaussian (0 1). (2) A Laplace LN(0 1) with density exp( x )2. (3) A Gamma Γ(5 1). −| | (4) A mixture of a Gaussian and a Gamma 2 ( 4 1) + 1 Γ(3 1). 3 N − 3 After preliminary experiments the constant κ is taken equal to 176 and the cutoff ℓˆ is selected among 100 equispaced values between 0 and 10. We consider different values of ∆: 02 05 08. For each ∆ we choose K such that n∆2K+2 1; more precisely the corresponding values of K are 2 5 17 respectively. It ensures that the≤ estimator is minimax (see Theorem 4.1 and Proposition 4.3). Results are given in Table 1, where 50 estimated curves are plotted on the same figure to show the small variability of the estimator. We take a sample size n = 5000 and an intensity c = 05, the first column gives the result for ∆ = 02 (K = 2), the second for ∆ = 05 (K = 5) and the last for ∆ = 08 (K = 17). On top of each graph we give the mean of selected values for ℓˆand the associated standard deviation in parenthesis evaluated 8 F.COMTE,C.DUVAL,ANDV.GENON-CATALOT

ℓˆ = 100 (025) ℓˆ = 109 (061) ℓˆ = 126 (099)

0.4 0.4 0.4

0.2 0.2 0.2

0 0 0

−2 0 2 −2 0 2 −2 0 2

ℓˆ = 252 (079) ℓˆ = 243 (065) ℓˆ = 257 (087)

0.4 0.4 0.4

0.2 0.2 0.2

0 0 0

−5 0 5 −5 0 5 −5 0 5

ℓˆ = 065 (010) ℓˆ = 066 (018) ℓˆ = 076 (031)

0.2 0.2 0.2

0.1 0.1 0.1

0 0 0

0 5 10 0 5 10 0 5 10

ℓˆ = 092 (021) ℓˆ = 091 (013) ℓˆ = 098 (029)

0.2 0.2 0.2

0.1 0.1 0.1

0 0 0

−5 0 5 10 −5 0 5 10 −5 0 5 10 Table 1. Estimation of f for a Gaussian (0 1) (first line), Laplace L(0 1) second line, Gamma Γ(5 1) (third line)N and the mixture 2 1 3 ( 4 1) + 3 Γ(3 1) (fourth line) with c = 05 and n = 5000. True den- sityN (bold− blak line) and 50 estimated curves (red lines), left ∆ = 02 and K = 2; middle ∆=05 and K = 5; right ∆ = 08 and K = 17. The value ℓˆ is the mean over the 50 selected ℓˆ’s (with standard deviation in paren- thesis).

over the fifty plots given. It appears that for each ∆ the estimator well reproduces the estimated density with little variability. Increasing ∆, and therefore K, does not affect the accuracy nor the variability of the estimator. ESTIMATIONFORCOMPOUNDPOISSONPROCESS 9

6. Concluding remarks In this paper, we propose a nonparametric estimator of the jump density f of a com- pound Poisson process. The process (Xt) is discretely observed with sampling interval ∆ until exactly n nonzero increments are obtained. This provides a n-sample of the con- ditional distribution g∆ of X∆ given X∆ = 0. The setting is more general than in van Es et al. (2007), as the intensity of the Poisson process is unknown and is estimated. By inverting the operator P∆ : f P∆f = g∆, we define a class of nonparametric estimators of f depending on a cutoff parameter→ ℓ and a truncation parameter K. For given K and small ∆, we define an adaptive choice of ℓ and prove that the resulting adaptive estimator is minimax over Sobolev balls. The estimator is easy to implement and performs well even for small K. An interesting development would be to look for an adaptive choice of both ℓ and K by K+2 including the term AK ∆ in the penalty, K being searched in a finite set of integers. Another direction, investigated by Duval (2012b) with wavelet estimators, would be an extension to renewal processes: but the lack of independence between increments makes the theoretical study much more tedious.

7. Appendix: Proofs

7.1. Proof of Proposition 2.1. The joint distribution of (S1 Z1) is elementary using x that the increments Xj∆ X(j 1)∆ are i.i.d.. The process (Xj∆ = x + Xj∆ j 1) is − − N ≥ strong Markov. We denote by Px its distribution on the canonical space R , denote by N (Xj j 0) the canonical process of R and by j = σ(Xk k j) the canonical filtration. ≥ N N F ≤ Let θ : R R denote the shift operator. Consider the stopping times built on the → canonical process S0 = 0,

(23) S1 = inf j 1 Xj Xj 1 = 0 Si = inf j > Si 1 Xj Xj 1 = 0 i 2 { ≥ − − } { − − − } ≥ and let

(24) Zi = XS XS 1 i − i− Because the Si’s are built using the increments (Xj Xj 1 j 1), their distributions P − − ≥ under x is independent of the initial condition x. We have Si = Si 1 + S1 θSi 1 . The − ◦ − process (XSi 1+j XSi 1 = (Xj X0) θSi 1 j 0) is independent of Si 1 and has P− − − − ◦ − ≥ F − distribution 0 and Zi = Z1 θSi 1 . Consequently, ◦ − E E x(ϕ(Si Si 1)ψ(Zi) Si 1 ) = 0(ϕ(S1)ψ(Z1)) − − |F − By iterate conditioning, we get the result.  7.2. Proof of Proposition 2.2. Let us set c∆ c∆ e 1 p(∆) = 1 e− = − − ec∆ An elementary computation yields: x 1 1 c∆ = log ( ) with x := x(∆) = = 1 + > 1 x 1 p(∆) ec∆ 1 − − 10 F.COMTE,C.DUVAL,ANDV.GENON-CATALOT and (ec∆ 1)m − = H (x) c∆ m As the standard maximum likelihood (and unbiased) estimator of 1p(∆) computed from the sample (Si Si 1 i = 1 n) is Snn 1, we are tempted to estimate Hm(x) by − − ≥ Hm(Snn). This is not possible as Snn may be equal to 1. This is why we introduce a truncation. Set u = ∆(ec0∆2 1) u = ∆(e2c1∆ 1) u = ∆(ec∆ 1). Note that 0 − 1 − − u u u 1 + 1 < x = 1 + < 1 + 0 ∆ ∆ ∆ We have cm(∆) cm(∆) = Hm(Snn)1(1+ u1 Sn 1+ u0 ) Hm(x) = A1 + A2 − ∆ ≤ n ≤ ∆ − with

A1 = (Hm(Snn) Hm(x)) 1(1+ u1 Sn 1+ u0 ) A2 = Hm(x)(1( Sn <1+ u1 )+1( Sn >1+ u0 )) − ∆ ≤ n ≤ ∆ − n ∆ n ∆ Thus, on the set (1 + u1 Sn 1 + u0 ), ∆ ≤ n ≤ ∆ S 2 n 2 ′ 2 (Hm(Snn) Hm(x)) ( x) sup (Hm(ξ)) − ≤ n − ξ [1+ u1 1+ u0 ] ∈ ∆ ∆ As m 1 H ′ (ξ) = + m − m+1 ξ m+1 2 ξ (ξ 1) log ξ 1 ξ(ξ 1) log ξ 1 − − − − we have, for ξ [1 + u1 1 + u0 ], ∈ ∆ ∆ 2∆m 2 ′ Hm(ξ) m+1 m + | | ≤ c u u1c0 0 1 2c1∆ 2sc1∆ Writing that e 1 = 2c1∆e for s (0 1), using that 2c1∆ log(2), we get 1u 4c . As − ∈ ≤ 1 ≤ 1 S 1 p(∆) ec∆ E( n x)2 = − = n − np2(∆) n(ec∆ 1)2 − we get, using ec∆ 1 c∆ c ∆: − ≥ ≥ 0 2(m 1) 2(m+1) 2 2 ∆ − 4√2(4c1) 8c1 EA C′ with C′ = m + 1 ≤ m n m c4 c 0 0 Then, we have, setting a = u u > 0 a = u u > 0, 0 0 − 1 − 1 S u S u ∆ S S ∆ P n < 1 + 1 + P n > 1 + 0 = P( ∆ n > a ) + P(∆ n > a ) n ∆ n ∆ p(∆) − n 1 n − p(∆) 0 1 1 ∆2 ec∆ ( + ) ≤ a2 a2 n(ec∆ 1)2 1 0 − ESTIMATIONFORCOMPOUNDPOISSONPROCESS 11

Thus, noting that u u 1(2c ) and u u 1(4√2c ), 0 − ≥ 1 − 1 ≥ 0 c∆ 2(m 1) c∆ 2(m 1) E 2 1 1 (e 1) − e ∆ − (25) A2 ( 2 + 2 ) − 2 Cm′′ ≤ a1 a0 nc ≤ n where (4c )2(m 1) √ 2 2 1 − Cm′′ = 4 2 8c0 + c1 2 c0  The proof is complete with Cm = 2(Cm′ + Cm′′ ). 7.3. Proof of Proposition 4.1. Consider f integrable with f = f and square 1 | | integrable such that (1 + x2)α f (x) 2dx R. Then | ∗ | ≤ c∆ 2 m k 2 α 2 e− (c∆) (c∆) 2 α m k (1 + x ) g∗(x) dx = (1 + x ) [f ∗(x)] [f ∗( x)] dx | | 1 e c∆ m! k! − − mk 1 − ≥ c∆ 2 m k e− (c∆) (c∆) m+k 2 2 α 2 f − (1 + x ) f ∗(x) dx ≤ 1 e c∆ m! k! 1 | | − mk 1 − ≥ 2 2 e c∆ 1 (c∆)m R − f m ≤ 1 e c∆ f 2  m! 1  − 1 m 1 − ≥ 2 e c∆ exp(c∆ f ) 1  = R − 1 − := R(∆) < + 1 e c∆ f ∞ − − 1 As f is a density, f 1 = 1 and R(∆) = R. This implies the announced result for g. If 2 α 2 the density g belongs to (α R), then (1 + x ) g∗(x) is continuous and integrable, thus bounded by B. ThereforeS g⋆m (mα R ) with| R |= Bm 1R.  ∈ S m m − m+1 m 7.4. Proof of Proposition 4.2. Recall that f ∗ = m 1(( 1) m)cm(∆)(g∗) (see ≥ − (6)-(7)). Let fℓ be such that fℓ∗ = f ∗ 1[ πℓπℓ] and fKℓ be such that − K+1 m+1 ( 1) m fKℓ∗ = 1[ πℓπℓ] − cm(∆)(g∗) − m m=1 Recall that fKℓ (see (11)) is such that

K+1 m+1 ( 1) m (fKℓ)∗ = 1[ πℓπℓ] − cm(∆)(g∗) − m m=1 We distinguish the first term of this development from the other ones and set (1) (1) (26) f = f + f with f = c (∆)g⋆1 = c (∆)g Kℓ Kℓ R Kℓ Kℓ 1 ℓ 1 ℓ Analogously, with gℓ such that gℓ∗ = g∗ 1[ πℓπℓ], − (27) f = f (1) + f with f (1) = c (∆)g Kℓ Kℓ R Kℓ Kℓ 1 ℓ 12 F.COMTE,C.DUVAL,ANDV.GENON-CATALOT

The following decomposition of the L2-norm holds:

(1) f f f f + f f + f (1) f − Kℓ ≤ − ℓ ℓ − Kℓ Kℓ − Kℓ + f f + f f R Kℓ − R Kℓ Kℓ − Kℓ which involves two bias terms and two stochastic error terms. The first bias term is the usual deconvolution bias term:

2 1 2 f fℓ = f ∗(t) dt − 2π t πℓ | | | |≥

Noting that

m+1 ∞ ( 1) m fℓ∗ fKℓ∗ = 1[ πℓπℓ] − cm(∆)(g∗) − − m m=K+2 we get, using that g (t) 1 and g f (see Proposition 4.1): | ∗ | ≤ ≤ 2 πℓ m+1 2 2 ∞ ( 1) m 2π fℓ fKℓ = fℓ∗ fKℓ∗ = − cm(∆)(g∗) (t) dt − − πℓ m − m=K+2 2 2 πℓ 1 2 1 c (∆) g∗(t) dt 2π g c (∆) ≤  m m | | ≤  m m  πℓ m K+2 m K+2 − ≥ ≥ 2 2π f 2 (ec∆ 1)K+2   − ≤ (c∆)2(K + 2)2 2 ec∆ − 4π f 2(√2c∆)2K+2 (28) 2πA ∆2K+2 ≤ ((K + 2)2(2 e2∆))2 ≤ K − where in the last line, we have used 1(2 ec∆)2 1(2 √2)2 3 and ec∆ 1 √2c∆ − ≤ − ≤ − ≤ and AK is given in (18). To study the next term, we recall that, E( (g )(t) (g )(t) 2) 1n Then we get | ∗ − ∗ | ≤

πℓ 2 E (1) (1) 2 E 2π fKℓ fKℓ = c1(∆)[(g∗)(t) (g∗)(t)] dt − πℓ − − 2πℓ[c (∆)] 2 4πℓ (29) 1 ≤ n ≤ n since c (∆) √2. 1 ≤ ESTIMATIONFORCOMPOUNDPOISSONPROCESS 13

Hereafter, we use inequality (15) of Proposition 3.1.

K+1 2 πℓ ( 1)m+1 E 2 E m m 2π fKℓ fKℓ = − cm(∆)[(g∗) (t) (g∗) (t)] dt R − R πℓ  m −  − m=2 πℓ  K+1  1 2 m m 2 (K + 1) [c (∆)] E (g ) (t) (g∗) (t) dt ≤ m2 m | ∗ − | πℓ m=2 − K+1 ℓ g 2 2πK Em [c (∆)]2 + ≤ m2 m nm n m=2 This yields, since c (∆) (√2)m(c∆)m 1 and ℓn 1, m ≤ − ≤ D (30) E f f 2 K R Kℓ − R Kℓ ≤ n with K+1 m 2(m 1) 2 c − m 2(m 1) 1 2 DK = K 2 E ∆ − m 2 + g m n − m=2 For the last term, we use Proposition 2.2, with the fact that the estimators cm(∆) and m (g∗) (t) are independent, and write

K+1 2 πℓ ( 1)m+1 2πE f f 2 = E − c (∆) c (∆) (g )m(t) dt Kℓ − Kℓ  m m − m ∗  πℓ m=1 −  2  πℓ K+1 m+1 ( 1) m m 2 E − c (∆) c (∆) [(g ) (t) (g∗) (t)] dt ≤  m m − m ∗ −  πℓ m=1 −  2  πℓ K+1 m+1 ( 1) m +2 E − c (∆) c (∆) (g∗) (t) dt  m m − m  πℓ m=1 − K+1  1 2 πℓ E E m m 2 2(K + 1) 2 cm(∆) cm(∆) (g∗) (t) (g∗) (t) dt ≤ m − πℓ | − | m=1 − 2 πℓ 2m +E cm(∆) cm(∆) g∗(t) dt − πℓ | | − K+1 2(m 1) πℓ C1 2πℓ 2 Cm∆ − m 1 1 2 1 2 2(K + 1) ( + 2π g ) + E + g∗(t) dt + g∗ ≤ n n m2 n nm n| | n m=2 πℓ − 2πE (31) K ≤ n 14 F.COMTE,C.DUVAL,ANDV.GENON-CATALOT using that ℓn 1 and ≤ K+1 2 Cm 2(m 1) 1 2 EK = 2(K + 1) C1(1 + g ) + 2 ∆ − m( m 1 + 2 g ) m E n − m=2 This ends the proof of the result with D + E B and g f .  K K ≤ K ≤ 7.5. Proof of Theorem 4.1. Consider the contrast γ (t) = t 2 2 t f n − KLn and for ℓ = 1 Ln, the increasing sequence of spaces 2 1 S = t L L (R) supp(t∗) [ πℓ πℓ] ℓ { ∈ ∩ ⊂ − } Note that, for ℓ L and t S , γ (t) = t 2 2 t f and ≤ n ∈ ℓ n − Kℓ 2 arg min γn(t) = fKℓ with γn(fKℓ) = fKℓ t Sℓ − ∈

For ℓ ℓ Ln, s Sℓ and t Sℓ , the following decomposition holds: ∗ ≤ ∈ ∈ ∗ γ (t) γ (s) = t f 2 s f 2 2 t s f f n − n − − − − − KLn − and t s f f = t s f f . By definition of the estimator, − KLn − − KLn − Ln γ (f ) + pen(ℓˆ) γ (f ) + pen(ℓ) γ (f ) + pen(ℓ) n Kℓˆ ≤ n Kℓ ≤ n ℓ Thus, we obtain, ℓ 1 L , ∀ ∈ { n} f f 2 f f 2 + pen(ℓ) + 2 f f f f pen(ℓˆ) Kℓˆ − ≤ ℓ − Kℓˆ − ℓ KLn − Ln − 2 1 2 2 ˆ (32) fℓ f + pen(ℓ) + fKℓˆ fℓ + 4 sup t fKLn fLn pen(ℓ) ≤ − 4 − t S +S t =1 − − ∈ ℓ ℓˆ Then using 1 1 1 (33) f f 2 f f 2 + f f 2 4 Kℓˆ − ℓ ≤ 2 Kℓˆ − 2 − ℓ and decompositions (26) and (27), we get

(1) (1) t fKL fL = t fKL fKL + t fKL f + t fKL fKL + t fKL fL n − n n − n n − KLn R n −R n n − n By the Cauchy-Schwarz Inequality and for t = 1, we have t f f 2 4 f f 2 + 4 f f 2 KLn − Ln ≤ KLn − KLn R KLn − R KLn (1) 2 (1) 2 (34) +4 fKL fL + 4 t fKL f n − n n − KLn ESTIMATIONFORCOMPOUNDPOISSONPROCESS 15

Thus, inserting (33) and (34) in (32) yields 1 3 f f 2 f f 2 + 16 f f 2 2 Kℓˆ − ≤ 2 ℓ − KLn − Ln +16 f f 2 + 16 f f 2 + pen(ℓ) KLn − KLn R KLn − R KLn (1) (1) 2 +16 sup t fKL f pen(ℓˆ) n − KLn − t Sℓ ℓˆ t =1 ∈ ∨ Here, the bounds of Proposition 4.2 can be applied. Indeed (28), (30) and (31) are uniform with respect to ℓ and imply f f 2 A ∆2(K+2) E( f f 2) D n E( f f 2) E n KLn − Ln ≤ K R KLn −R KLn ≤ K KLn − KLn ≤ K Below, we prove using the Talagrand Inequality that

(1) (1) 2 C′ (35) E sup t fKL f p(ℓ ℓˆ) n − KLn − ≤ n t Sℓ ℓˆ t =1 ∈ ∨ + where p(ℓ ℓ ) = 8ℓ ℓ n and 16p(ℓ ℓ ) pen(ℓ) + pen(ℓ ) as soon as κ κ = 16 8. ′ ∨ ′ ′ ≤ ′ ≥ 0 × Thus, we get E(16p(ℓ ℓˆ) pen(ℓˆ)) pen(ℓ) and − ≤ B 32C E( f f 2) 4 f f 2 + 4pen(ℓ) + 32A ∆2(K+2) + 32 K + ′ Kℓˆ − ≤ − ℓ K n n Proof of (35). We consider t S for ℓ = ℓ ℓ with ℓ ℓ L and (see (26) and (27)) ∈ ℓ∗ ∗ ∨ ′ ′ ≤ n n 1 ν (t) = c (∆) t gˆ g = (ψ (Z ) E(ψ (Z )) n 1 Ln − Ln n t k − t k k=1 where c1(∆) iuz ψ (z) = t∗(u)e du = c (∆)t(z) t 2π 1 We apply the Talagrand Inequality recalled in Section 8, and to this aim, we compute the quantities M H v. First

c1(∆) sup sup ψt(z) √2πℓ∗ sup t∗ = c1(∆)√ℓ∗ := M t S t =1 z | | ≤ 2π × t S t =1 ∈ ℓ∗ ∈ ℓ∗ The density of Z1 is g which satisfies 1 (c∆)m g f ⋆ m f ∞ ≤ ec∆ 1 m! ∞ ≤ ∞ m 1 ≥ − Therefore, 2 E 2 2 sup Var(ψt(Z1)) c1(∆) sup (t (Z1)) c1(∆) f := v t S t =1 ≤ × t S t =1 ≤ ∞ ∈ ℓ∗ ∈ ℓ∗ Lastly, using the bound in (29) and the fact that for t S , ∈ ℓ∗ (1) (1) (1) (1) t fKL f = t fKℓ f n − KLn ∗ − Kℓ∗ 16 F.COMTE,C.DUVAL,ANDV.GENON-CATALOT we get

E 2 E (1) (1) 2 E (1) (1) 2 ( sup νn(t)) = sup t fKℓ∗ fKℓ fKℓ∗ fKℓ t S t =1 t S t =1 − ∗ ≤ − ∗ ∈ ℓ∗ ∈ ℓ∗ 2ℓ ∗ := H2 ≤ n Therefore, Lemma 8.1 yields with ǫ2 = 12,

A1 E 2 2 A2ℓ∗ A3√n ( sup νn(t) 4H ) (e− + e− ) t S t =1 − ≤ n ∈ ℓ∗ Ln A2ℓ ℓ for constants A1 A2 A3 depending on c1(∆) and f . Now since e− ∨ ′ = ∞ ℓ′=1 A2ℓ A2ℓ′ A3√n ℓe− + ℓ<ℓ Ln e− is bounded by say B2 and Lne− is bounded by B3, we get ′≤ ℓ ℓˆ B E sup ν2(t) 8 ∨ E( sup ν2(t) 4H2) 4 n − n ≤ n − ≤ n t Sℓ ℓˆ t =1 ℓ t Sℓ ℓ t =1 ∈ ∨ ′ ∈ ∨ ′ This ends the proof of (35) and thus of Theorem 4.1. 

8. Appendix. The result below follows from the Talagrand concentration inequality given in Klein and Rio (2005) and arguments in Birg´eand Massart (1998) (see the proof of their Corollary 2 page 354).

Lemma 8.1. (Talagrand Inequality) Let Y1 Yn be independent random variables, let ν (f) = (1n) n [f(Y ) E(f(Y ))] and let be a countable class of uniformly bounded nY i=1 i − i F measurable functions. Then for ǫ2 > 0 2 2 2 2K C(ǫ )ǫ nH 2 2 2 4 v K ǫ2 nH 98M 1 E 1 v − 7√2 M sup νnY (f) 2(1 + 2ǫ )H e− + 2 2 2 e f | | − + ≤ K1 n K1n C (ǫ ) ∈F with C(ǫ2) = √1 + ǫ2 1, K = 16, and − 1 n 1 sup f M E sup νnY (f) H sup Var(f(Yk)) v f ∞ ≤ f | | ≤ f n ≤ ∈F ∈F ∈F k=1 By standard density arguments, this result can be extended to the case where is a F unit ball of a linear normed space, after checking that f νn(f) is continuous and contains a countable dense family. → F

References

[1] Basawa, I. V. ; Brockwell, P. J. (1982) Nonparametric estimation for nondecreasing L´evy processes. J. Roy. Statist. Soc. Ser. B 44, 262-269. [2] Buchmann, B. and Gr¨ubel, R. (2003) Decompounding: an estimation problem for Poisson random sums. Ann. Statist. 31 1054-1074. ESTIMATIONFORCOMPOUNDPOISSONPROCESS 17

[3] Buchmann, B. (2009) Weighted empirical processes in the nonparametric inference for L´evy processes. Math. Methods Statist. 18, 281-309. [4] Chen, S. X., Delaigle, A. and Hall, P. (2010) Nonparametric estimation for a class of L´evy processes. J. Econometrics 157, 257-271. [5] Chesneau, C., Comte, F. and Navarro, F. (2013) Fast nonparametric estimation for of densities. Preprint MAP5. [6] Comte, F. and Genon-Catalot (2009) Nonparametric estimation for pure jump L´evy processes based on high frequency data. Stochastic Processes and their Applications 119, 4088-4123. [7] Comte, F. and Genon-Catalot, V. (2010) Nonparametric adaptive estimation for pure jump L´evy processes. Ann. Inst. Henri Poincar Probab. Stat. 46, 595-617. [8] Comte, F. and Genon-Catalot, V. (2011) Estimation for L´evy processes from high frequency data within a long time interval. Ann. Statist. 39, 803-837. [9] Duval, C. (2012a) Adaptive wavelet estimation of a compound Poisson process. Preprint arXiv:1203.3135. [10] Duval, C. (2012b) Nonparametric estimation of a renewal reward process from discrete data. Preprint arXiv:1207.1611. [11] Embrechts, P., Kl¨uppelberg, C. and Mikosch, T. (1997). Modelling extremal events. Berlin: Springer. MR1458613. [12] Figueroa-L´opez, J. E. and Houdr´e, C. (2006) Risk bounds for the non-parametric estimation of L´evy processes. High dimensional probability, 96-116, IMS Lecture Notes Monogr. Ser., 51, Inst. Math. Statist., Beachwood, OH. [13] Figueroa-L´opez, J. E. (2009) Nonparametric estimation of L´evy models based on discrete-sampling. Optimality, 117-146, IMS Lecture Notes Monogr. Ser., 57, Inst. Math. Statist., Beachwood, OH. [14] Gugushvili, S. (2009) Nonparametric estimation of the characteristic triplet of a discretely observed L´evy process. J. Nonparametr. Stat. 21, 321-343. [15] Gugushvili, S. (2012) Nonparametric inference for discretely sampled L´evy processes. Ann. Inst. Henri Poincar´eProbab. Stat. 48, 282-307. [16] Jongbloed, G., van der Meulen, F. H. and van der Vaart, A. W. (2005) Nonparametric inference for L´evy-driven Ornstein-Uhlenbeck processes. Bernoulli 11, 759-791. [17] Katz, R. W. (2002) Stochastic modeling of hurricane damage. J. Appl. Meteorol. 41, 754-762. [18] Kessler, M. (1997) Estimation of an ergodic diffusion from discrete observations. Scand J Stat. 24, 211-229. [19] Kim, Y. (1999). Nonparametric Bayesian estimators for counting processes. Ann. Statist. 27, 562-588. [20] Klein, T. and Rio, E. (2005). Concentration around the mean for maxima of empirical processes. Ann. Probab. 33 1060-1077. [21] Neumann, Michael H. and Reiss, M. (2009). Nonparametric estimation for L´evy processes from low- frequency observations. Bernoulli 15, 223-248. [22] Scalas, E. (2006). The application of continuous-time random walks in finance and economics. Physica A 362, 225-239. [23] Schick, A. and Wefelmeyer, W. (2004). Root n consistent density estimators for sums of independent random variables. J. Nonparametr. Statist., 16, 925-935. [24] Tsybakov, A. (2009) Introduction to Nonparametric Estimation. Springer. [25] Ueltzh¨ofer, F. A. J. and Kl¨uppelberg, C. (2011). An oracle inequality for penalised projection estima- tion of L´evy densities from high-frequency observations. J. Nonparametr. Stat. 23, 967-989. [26] Zhao, Z. and Wu, W. B. (2009). Nonparametric inference of discretely sampled stable L´evy processes. J. Econometrics 153, 83-92.