<<

View metadata, citation and similar papers at core.ac.uk brought to you by CORE

provided by ScholarBank@NUS

CROSS VALIDATION METHOD ON WEIGHTED ISOTONIC REGRESSION FOR FITTING

ZHANG YIWEN

NATIONAL UNIVERSITY OF SINGAPORE

2017 CROSS VALIDATION METHOD ON WEIGHTED ISOTONIC REGRESSION FOR NONPARAMETRIC REGRESSION FITTING

ZHANG YIWEN (A0123849X)

SUPERVISOR: ASSOCIATE PROFESSOR YU TAO

EXAMINERS: ASSOCIATE PROFESSOR LI JIALIANG DR YAO ZHIGANG

A THESIS SUBMITTED FOR THE DEGREE OF MASTER OF SCIENCE DEPARTMENT OF AND APPLIED PROBABILITY NATIONAL UNIVERSITY OF SINGAPORE

2017 Dedication

I hereby declare that this thesis is my original work and it has been written by me in its entirety. I have duly acknowledged all the sources of information which have been used in the thesis.

This thesis has also not been submitted for any degree in any university previously.

Zhang Yiwen June 2017

1 Acknowledgement

I would like to express my deepest appreciation to my supervisor Associate Professor Yu Tao. He is knowledgeable, nice and patient. When I had trouble in the project, he could always provide constructive advice and professional guidance. Without his support and guide, I could not finish the project and the thesis in the end.

Besides, I feel grateful for my friends who have accompanied and helped me a lot during my study in NUS. I want to thank Yue Mu for her time and patience, Yaxian for her recommendation, Wen Jun for his sharing notes, my house mates Xueqi and Li Yun for the support and care, Chin Hau for his advice, Huang Yi, Zhang Liang, Gaoshuang, Jiejie and others for their encouragement and help.

Moreover, I want to express my greatest gratitude to my parents for their long lasting support and understanding.

At last, I appreciate the examiners, Associate Professor Li Jialiang, and As- sistant Professor Yao Zhigang. Their professional and detailed amendment comments make me learn more and improve further.

2 Contents

1 Introduction 9 1.1 Motivation ...... 9 1.2 Organization ...... 12

2 Literature Review 13 2.1 Local ...... 13 2.2 Classical and Generalized Continuous Isotonic Regression . . . 14 2.3 Pool Adjacent Violators Algorithm (PAVA) ...... 16 2.4 Bandwidth Selection for Nonparametric Regression ...... 17

3 Methodology 21 3.1 Local Polynomial Method Estimation ...... 21 3.2 Continuous Isotonic Method Estimation ...... 22 3.3 Discussion on the Weight Function ...... 23 3.4 Computing Isotonic Regression Estimation ...... 24 3.5 Cross-Validation Bandwidth Selection ...... 25

4 Simulation Study 28 4.1 Fitting the Model ...... 28 4.2 Numerical Performance Comparisons ...... 33 4.2.1 L1 Distance Comparison ...... 33 4.2.2 L1 Percentage Improvement ...... 38

3 4.2.3 Confidence Interval ...... 39 4.3 Numerical performance for other simulations ...... 41 4.3.1 Simulated from Uniform Distribution ...... 41 4.3.2 Numerical Performance for Mixture Uniform Distribu- tion ...... 45 4.4 Numerical performance for other functions ...... 48 4.5 Numerical performance for increasing functions ...... 54

5 Application to Real World Data Set 61

6 Conclusions 64

4 Summary

This thesis discusses the local polynomial regression method and the iso- tonic regression method to solve the nonparametric regression problem Y = g(X) + , subject to the condition that g(·) is a smooth and non-decreasing function. In order to solve the continuous isotonic regression problem, Pool Adjacent Violators Algorithm (PAVA) is used by updating the local polyno- mial regression estimates of a large amount of quantiles of X. Considering the importance of the bandwidth selection for nonparametric fitting, we propose a new cross validation method that incorporates the monotone constraint that g(·) is non-decreasing. Furthermore, the simulations are conducted to evaluate the performance of this cross validation method in comparison with the plug-in method for bandwidth selection in the local polynomial regres- sion and the weighted isotonic regression estimation. By obtaining the out- comes of L1 distance, L1 distance percentage improvement, and confidence interval bands, we conclude that the proposed cross validation method for bandwidth selection improves the performance if sufficient data is provided for the nonparametric regression estimations. The proposed cross-validation bandwidth selection method in the weighted isotonic regression estimation outperforms others in this nonparametric estimation problem because they both contribute to factor in the monotone constraint.

5 List of Figures

4.1 True function g(x) ...... 29 4.2 of Simulated Data (Normal Distribution) . . . . . 30 4.3 Local Polynomial Estimatesg ˜(x) ...... 31 4.4 Isotonic Regression Estimatesg ˆ(x) ...... 32 4.5 Boxplot of L1 Distance 1 (Normal Distribution) ...... 35 4.6 Scatter Plot of Simulated Data (Truncated Normal Distribution) 37 4.7 Boxplot of L1 Distance 2 (Truncated Normal Distribution) . . 37 4.8 Partial plot of confidence bands 1 ...... 40 4.9 Scatter Plot of Simulated Data from Uniform Distribution . . 42 4.10 Boxplot of L1 Distance 3 (Uniform distribution) ...... 43 4.11 Partial plot of confidence bands 2 ...... 45 4.12 Scatter Plot of Simulated Data (Mixture Uniform Distribution) 46 4.13 Boxplot of L1 Distance 4 (Mixture Uniform Distribution) . . . 47 4.14 Partial plot of confidence bands 3 ...... 48 4.15 True function f(x) ...... 49 4.16 True function h(x) ...... 50 4.17 Boxplot of L1 Distance 5 ...... 51 4.18 Boxplot of L1 Distance 6 ...... 52 4.19 Partial plot of confidence bands 4 ...... 53 4.20 Partial plot of confidence bands 5 ...... 54 4.21 True function l(x)...... 55

6 4.22 True function e(x) ...... 56 4.23 Boxplot of L1 Distance 7 ...... 57 4.24 Boxplot of L1 Distance 8 ...... 58 4.25 Partial plot of confidence bands 6 ...... 59 4.26 Partial plot of confidence bands 7 ...... 60

5.1 Two Regressions Comparison Case 1 ...... 62 5.2 Two Regressions Comparison Case 2 ...... 63

7 List of Tables

4.1 Improvement Percentage (Truncated Normal Distribution)4.1 . 38 4.2 Improvement Percentage (Uniform Distribution)4.2 ...... 43 4.3 Improvement Percentage (Mixture Uniform Distribution)4.3 . 47 4.4 Improvement Percentage 4.4 for f(x) ...... 53 4.5 Improvement Percentage 4.5 for h(x) ...... 53 4.6 Improvement Percentage 4.6 for l(x) ...... 58 4.7 Improvement Percentage 4.7 for e(x) ...... 59

8 Chapter 1

Introduction

1.1 Motivation

Regression analysis, as one of the most popular techniques in statistics, is of great importance to explore the relationship between the dependent and independent variables. There are a large number of regression models defined and proposed in the research studies so far.

From the perspective of parameters, the regression can be divided into three categories, parametric regression, non-parametric regression, and semi-parametric regression. Parametric regression depends on the assumptions about the shape of the distribution. That is to say, the function form of parametric model is specified beforehand. The purpose is to estimate the parameters of the function. In contrast, non-parametric regression relaxes the assumptions. Few assumptions would be assumed about the distribution of the observed data in nonparametric regression. The predictive model requires large data sample size to directly estimate the regression function. Semi-parametric re- gression, as the name shows, combines both parametric and non-parametric regression methods.

9 In various statistical contexts, local modelling techniques are data-analytic in which data determines the regression functions. Local polynomial regression is one of the most popular methods in solving non-parametric problems. The non-parametric problem can simply be considered as

Y = g(X) + , (1.1) where i’s are independent and identically distributed (i.i.d) random variables from a certain distribution, and the E() = 0 and g(·) is a smooth func- tion.

n Based on the bi-variate data, (Xi,Yi)i=1, in the observation with the sample size n, the local polynomial regression builds a model to fit the data. By us- ing the strip of data around X, the regression method finds the minimizer of a low-order weighted estimates Y . In this way, the relationship between the explanatory variable X and response variable Y is connected by the local polynomial regression estimation. The estimate value is denoted as g˜(X)(Fan and Gijbels, 1996).

One motivation in this thesis is that although the local polynomial regres- sion is widely used, it may not be the best solution if observed data is under some condition. For instance, if a condition that g(·) is non-decreasing is added, the estimationg ˜(X) would be probably not preferable because the local polynomial regression does not fulfill the requirement of monotonicity. Considering the monotone constraint, the isotonic regression can be intro- duced because it can achieve a non-decreasing fit. Plus different weights assigned the observed data, a weighted continuous isotonic method is pro- posed based on the local polynomial method to solve the non-parametric problem Y = g(X) +  subject to the condition that g(·) is a non-decreasing

10 function.

Since the fitted non-parametric regression is based on the data in the lo- cal neighborhood, it is very essential to choose the appropriate size of the local neighborhood, which is also called bandwidth. As the most critical smoothing parameter, the bandwidth is usually chosen either by data or by

Xi−x experienced data analysts. Then the kernel function K( h ) is a weight to the point (Xi,Yi). It decays fast to eliminate the impact of the remote data points. For each given x, fit a for the data points contained in the strip x ± h, using the kernel weight function. The whole regression curve is obtained by estimating the regression function in a strip of data points(Fan and Gijbels, 1996).

The technique of Cross-Validation(CV) method is quite useful to solve a number of statistical problems. It is also a good way to choose the band- width because it is data-dependent. Therefore, the cross validation method is adopted to obtain the bandwidth by locating the minimizer of the objective function which is concerned with the weighted isotonic regression estimates. More details of the strategy will be explained and discussed in the following chapters.

Therefore, the motivation of this project is to propose a cross validation method for bandwidth selection so as to improve the performance of the non-parametric regression estimates under the condition that the regression function is non-decreasing.

11 1.2 Organization

The thesis is organized as follows. In chapter 2, the concepts and meth- ods related to the project will be reviewed, including the local polynomial regression, isotonic regression, Pool Adjacent Violators Algorithm(PAVA), plug-in method, and cross-validation method for bandwidth selection. Then the methodology for the project will be discussed in chapter 3 in detail. Af- ter that, the simulations will be conducted in the chapter 4. Then in the chapter 5, the proposed cross validation method will be applied to the real data set and real data analysis will be conducted. Besides, the discussion and analysis will be included. Finally in chapter 6, the conclusions and the suggestions would be made for further study in the future.

12 Chapter 2

Literature Review

2.1 Local Polynomial Regression

Considering the non-parametric model (1.1), the local polynomial method is often used to approximate the unknown regression function g(x) by using the data from the neighborhood. We denote the estimate of unknown regression function with a local polynomial order of p by g(x), A Taylor expansion shows, for x in a neighborhood of x0, g(p)(x ) g(x) ≈ g(x ) + g0(x )(x − x ) + ··· + 0 (x − x )p. 0 0 0 p! 0 This polynomial can be fitted by regression problem. ˆ Denote by βj, j=0,1,··· ,p, the solution to the least squares regression prob- (p) ˆ g (x0) ˆ lem. βp = p! , g(x) can be estimated by finding β:

n p ˆ X X j 2 β = argminβ0,β1,··· ,βp {Yi − βj(Xi − x0) } Kh(Xi − x0), i=1 j=0 where h is the bandwidth, which is the size of the local neighborhood and

Kh(x) = Kh(x/h)/h with K(·) a kernel function assigning weights to the data points.

13 It is more convenient to transform the weighted least squares problem in matrix form(Fan and Gijbels, 1996). Here we denote X as the design matrix centered at x0:

 p 1 X1 − x0 ... (X1 − x0) . . . .  X = . . .. .    p 1 Xn − x0 ... (Xn − x0) and    ˆ  Y1 β0  .   .  y =  .  βˆ =  .      ˆ Yn βp and W is the n × n diagonal matrix of weights:

W = diagKh(Xi − x0).

The minimization problem can be written as:

min(y − Xβ)T W (y − Xβ). β

Then the solution of βˆ is

βˆ = (XT WX)−1XT W y.

2.2 Classical and Generalized Continuous Iso- tonic Regression

In the classical isotonic regression, a vector in Rn provides solution to a least square fit with the non-decreasing constraint. It solves the following problem. For a vector y = (y1, y2, ··· , yn) ∈ Rn, the isotonic regression of

14 n y with weights w = (w1, w2, ··· , wn) in (0, +∞) fulfills to minimize the following objective quadratic function

n 1 X φ(x) = w (y − x )2 2 i i i i=1 subject toy ˆmin = x1 6 x2 ··· 6 xn =y ˆmax(Barlow, 1972). Pool Adjacent Vi- olators Algorithm (PAVA) offers a solution to the classical isotonic regression problem(Ayer et al., 1955). PAVA repeats the procedure to modify the yi which violate the ordering constraint to obtain the new vector x which meets the non-decreasing constraint, where x1 6 x2 6, ··· , 6 xn. The details of PAVA will be illustrated in the following section.

This discrete isotonic regression problem can be constructed in the graphical way in two dimensions below. Defining the sets of points Pi, i = 0, 1, ··· , n, Pi Pi by P0 = (0, 0) and Pi = ( j=1 wj, j=1 wjyj), the k-th component of the iso- tonic regression of y is derived from the left derivative of the greatest convex minorant of the set of the points Pi : 0 6 i 6 n, evaluated at Pk(Robertson et al., 1988).

Groeneboom and Jongbloed (2010) proposed the generalized continuous ver- sion of isotonic regression. It is also concerned with minimizing an objective quadratic function given below. The continuous version of isotonic regres- sion is concerned with L2 projections on the interval [a, b]. Let w > 0 be an integrable weight function on the interval [a, b] and g ∈ L2(w). Besides, define K the subset of L2(w) which consists of the non-decreasing functions on the same interval [a, b]. The continuous isotonic regression of g on K is denoted as below, 1 Z ϕ(f) = (f(x) − g(x))2w(x)dx. (2.1) 2

15 Similarly, the solution can also be constructed graphically. In case w = 1, the solution is the right continuous slope of the convex minorant of the parameterized graph in R2 given (Mammen, 1991), Z t Z t   w(x)dx, g(x)w(x)dx : t ∈ [a, b] . a a

2.3 Pool Adjacent Violators Algorithm (PAVA)

PAVA is an iterative solution to the monotonic regression problems(Ayer et al., 1955). Suppose a set of data with the sample size n,(Xi,Yi) where i = 1, 2, ··· , n. After sorting this data set by X, we obtain (X(i),Y(i)) in which X is non-decreasing. By PAVA,g ˆ(Xi), i = 1, ··· , n can be obtained by Pn (Y − g(X ))2  {gˆ(X )}n = argmin i=1 (i) (i) , (i) i=1 g n subject tog ˆ(X(1)) 6 gˆ(X(2)) 6 ··· 6 gˆ(X(n)) The algorithmic description of PAVA implementation is given below (Barlow, 1972).

Firstly, starting from the first pair Y(1),Y(2), check the non-decreasing con- straint for every adjacent pair of data. Move and stop at the pair where

Y(i) > Y(i+1). Then this pair would be pooled and replaced by the average as below: Y + Y Y ∗ = Y ∗ = (i) (i+1) . (i) (i+1) 2

Secondly, check backward for every adjacent pair of data. Once Y(i−1) > ∗ Y(i),Y(i−1),Y(i),Y(i+1) would be pooled and replaced by their average, Y + Y + Y Y ∗ = Y ∗ = Y ∗ = (i−1) (i) (i+1) . (i−1) (i) (i+1) 3 The backward checking continues till the monotonic requirement satisfies.

Following these two-step iteration, we obtain the outcome,g ˆ(X(i)), i = 1, 2, ··· , n.

16 2.4 Bandwidth Selection for Nonparametric Regression

There is no doubt that the bandwidth parameter h plays an essential role in the nonparametric regression. To some extent, the bandwidth dominants the complexity of the model. If a very small h is taken, the model bias would be small but the would be large. On the contrary, if the bandwidth is really large, considering the number of data points falling in the local neigh- borhood is large, it would produce small variance. However, in this way, the modelling bias tends to be large. Therefore, there is a tradeoff between these two bandwidth selection criteria. In other words, it is really crucial to find a practical way for selecting the bandwidth. Therefore, it is very useful and necessary to find a data-driven bandwidth which makes the regression estimation smooth. A well known bandwidth choice for nonparametric re- gression is the cross validation method(Rice, 1984).

Review the nonparametric regression model (1.1). Consider the estimated regression function, g(x) = E[Y |X = x], using the independent identically distributed data (X1,Y1),(X2,Y2),··· ,(Xn,Yn).

The local weighted average of the Yi is known as the kernel estimators(Watson, 1964). Pioneering work on the effective nonparametric estimators has been proposed and summarized by Wand and Jones (1995). For the simplicity of the presentation, h is treated as scalar here,

n X x − Xi g (x) = n−1 h−1K( )Y /f (x), h h i h i=1

17 where K(·) is a kernel function, h is the bandwidth, and fh(x) is the Rosenblatt- Parzen kernel density estimator of the marginal density f(x) of X,

n X x − Xi f (x) = (nh)−1 K( ). h h i=1 The bandwidth selection rule is the minimum estimate value of Integrated Squared Error(ISE): Z 2 ISE(h) = [gh(x) − g(x)] w(x)f(x)dx Z Z Z 2 2 = gh(x)w(x)f(x)dx − 2 gh(x)g(x)w(x)f(x)dx + g (x)w(x)f(x)dx

which is frequently proposed to estimate the optimal smoothing parame- ter(Hardle and Marron, 1985), where w(x) is a nonnegative weight func- tion. The last term apparently does not contain h. The minimization of the ISE(h) is equal to the minimization of cross-validation criterion CV (h)(Rice, 1984), Z Z 2 CV (h) = gh(x)w(x)f(x)dx − 2 gh(x)g(x)w(x)f(x)dx.

Actually it could not be realized in practice because it depends on the un- known g(x) and f(x). However, the second term can be written as Z n −1 X gh(x)g(x)w(x)f(x)dx = E(X,Y )[gh(X)Y w(X)] = n [gh,j(Xj)Yjw(Xj)]. j=1

Here gh,j(·) is the leave-one-out estimator based on the observations X1, ··· ,Xn except Xj, leaving out the jth observation Xj. This leave-one-out method is used to predict g(Xj) by the subsample X1, ··· ,Xj−1,Xj+1, ··· ,Xn.

X x − Xi g (x) = (n − 1)−1 h−1K( )Y /f (x) h,j h i h,j i6=j

18 X x − Xi f (x) = (n − 1)−1 h−1K( ) h,j h i6=j In the similar way, the first term can be approximated Z n 2 −1 X 2 gh(x)w(x)f(x)dx = n [gh,j(Xj)w(Xj)]. j=1 Therefore, CV (h) can be estimated

n n X 2 X CV (h) = n−1 [g2 (X )w(X )] − g (X )Y w(X ). h,j j j n h,j j j j j=1 j=1 Then since adding a term independent of h does not change the bandwidth selection rule. So CV (h) can be given as below

n −1 X 2 CV (h) = n [Yj − gh,j(Xj)] w(Xj). j=1 Therefore, choosing the bandwidth h is to find the minimizer of the cross- validation function CV (h), hCV = argminhCV (h).

In this project for proposed cross validation method for bandwidth selec- tion,g ˆ represents the isotonic regression estimation. The empirical cross validation function is as follows: n ˆ −1 X 2 CV (h) = n [Yj − gˆh,j(Xj)] w(Xj). j=1 In practice, the estimation of marginal area may have big bias then there would be overfitting problem. Therefore, when choosing the bandwidth, the marginal data would be ignored when calculating the cross validation score. ˆ ˆ The empirical cross bandwidth can be denoted as: hCVˆ = argminhCV (h). ˆ Therefore, hCVˆ denotes the empirical cross validation method for bandwidth selection. So the proposed cross validation method incorporates the non- decreasing constraint in the bandwidth selection.

19 Another reliable method for bandwidth selection is the direct plug-in method. The plug-in bandwidth selection rules for estimation perform good both in theory and in practice(Ruppert et al., 1995). Direct plug-in method was firstly proposed by Barlow (1972) in the . The plug-in method is fast, flexible and applicable to a variety of situations, from esti- mation the bandwidth for derivatives, for probability and spectral densities, smooth and nonsmooth regression problems, and even higher dimensional problems(Gasser et al., 1991).

20 Chapter 3

Methodology

In this chapter, we will focus on the strategies to use the cross valida- tion bandwith selection method to solve the non-parametric regression prob- lem subject to the constraint that nonparametric estimation g(·) is non- decreasing. In order to fulfill the monotone requirement, we adopted the popular algorithm called Pool-Adjacent-Violators-Algorithm (PAVA) which is usually used for solving classical isotonic regression problems. By using PAVA, at a large amount of quantiles of X the local polynomial regression estimates are updated to be non-decreasing isotonic regression estimates. In the similar way, the PAVA can be used for the cross validation method for bandwidth selection to optimize the regression estimates.

3.1 Local Polynomial Method Estimation

As shown in the previous chapter, the local polynomial regression model includes the local neighborhood bandwidth h and a kernel function K(·), and the fitting a polynomial of degree p. Choosing the order of the local polynomial is an issue in local polynomial fitting. If the order of the fitting polynomials is high, it can reduce the bias but increase the variability. In this

21 thesis, we assume X univariate and the order of local polynomial is equal to one, i.e. p = 1. Therefore, the objective function can be simplified to find the minimizer below(Fan and Gijbels, 1992):

n ˜ ˜ X 2 {β0, β1} = argmin{β0,β1} {Yi − [β0 + β1(Xi − x0)]} Kh(Xi − x0). i=1 ˜ ˜ Then it follows thatg ˜(x0) = β0 g˜0(x0) = β1. Undoubtedly it is necessary to choose an appropriate bandwidth to obtain a fitted regression function, neither under-parametrized nor over-parametrized. We will focus on it in the later sections. Besides, kernel function is also a concern that needs to be confirmed. It is acknowledged that the choice of kernel function is not so important for estimating the regression function. So we use the popular √ Gaussian Kernel function K(x) = exp(−x2/2)/ 2π in the following simula- tion studies.

3.2 Continuous Isotonic Method Estimation

As discussed before,g ˜(·) obtained by the local polynomial regression is proba- bly not non-decreasing. Therefore, to updateg ˜(·) into an increasing function, the steps are in the following. According to the least squares method, the goal is to minimize the objective function written as

n X 2 M0(g) = {g(Xi) − g˜(Xi)} i=1 Z 2 = {g(x) − g˜(x)} dFn(x) subject to the condition that g(x) is a non-decreasing function, where Fn(x) is defined as Pn I{X x} (x) = i=1 i 6 . Fn n

22 Here Fn(x) is the empirical cumulative distribution function (c.d.f.) of X. If F (x) is denoted as the c.d.f. of X, Fn(x) is the nonparametric estimate of F (x). Besides, X follows a continuous distribution, the kernel density f˜(x) is the estimate of the probability density function (p.d.f.) of the X as follows: n ˜ X f(x) = Kh(Xi − x)/n. i=1

Continuously M0 can be expressed as Z 2 ˜ M0(g) = {g(x) − g˜(x)} f(x)dx. (3.1)

Now the objective is to minimize the M0 subject to the condition that g(x) is a non-decreasing function. Then the updated estimateg ˆ(x) can be expressed as

gˆ(x) = argmingM0(g).

So far, the formula of M0 is similar to the form of weighted continuous isotonic regression of g(x) in (2.1). Hence, a weighted continuous isotonic regression using kernel density f(˜x) as the weight solves the previous problem of up- datingg ˜(x) to the non-decreasingg ˆ(x). In this way, the objective weighted continuous isotonic estimateg ˆ(x) is associated with local polynomial estimate g˜(x).

3.3 Discussion on the Weight Function

Before computingg ˆ(x), we discuss about weight function. As mentioned before, to fit a classical weighted isotonic regression, the weight vector w should be confirmed before using PAVA. X in the later simulation follows some :

X ∼ 0.4 × N(µ1 = 12, σ1 = 3) + 0.6 × N(µ2 = 24, σ2 = 3).

23 n The simulated data would be a scatter plot of the paired data (Xi,Yi)i=1. The data points would be distributed unevenly across the plot with different weights. Suppose X is simulated from a mixture normal distribution, which would be discussed further in the chapter 4.

It is clear that more points are clustered in the region around X = 12 and X = 24 The probability density function f(x) could be as the weight func- tion. Where the data has high probability density, there would be more data clustering together around. Considering that the objective function (3.1), the data values assigned with larger weights are more informative regions when updating fromg ˜(x) tog ˆ(x).

3.4 Computing Isotonic Regression Estima- tion

In the previous section, the objective estimated function is given as Z 2 ˜ gˆ(x) = argming {g(x) − g˜(x)} f(x)dx (3.2) subject to g(x) is an non-decreasing function. The problem of continuous isotonic regression can be solved by transferring to discrete isotonic regres- sion so PAVA can be used to update the estimates by following steps.

Firstly, calculate the N quantiles of the random variable X, X comes from the bivariable data (Xi,Yi) from a distribution where N is a sufficiently large integer. Plus i = 1, 2, ··· , n, n is the sample size of the data. Thus, the weighted continuous isotonic regression problem can transfer to a discrete q−0.5 isotonic regression by introducing N. tq is defined as the [ N ]-th quantile x ˜ of a f(y)dy where q = 1, ··· ,N. According to the definition of the quantile,

24 it follows Z tq q − 0.5 f˜(y)dy = . a N

Therefore, the objective function M0 can be calculated in the following way: Z 2 ˜ M0(g) = {g(x) − g˜(x)} f(x)dx Z Z x = {g(x) − g˜(x)}2d{ f˜(y)dy} a N X 2 = {g(tq) − g˜(tq)} /N. q=1

Therefore, the objective estimateg ˆ(x) can be obtained

N X 2 gˆ(x) = argming {g(tq) − g˜(tq)} (3.3) q=1 subject to the condition thatg ˆ(t1) 6 gˆ(x) 6 gˆ(tN ) So far, the continuous isotonic regression ofg ˜(x) can be approximated by using the discrete isotonic regression ong ˜(tq) with the weight vector, with the weight vector w = ~1.

Therefore, the solution is obtaining the local polynomial estimatesg ˜(tq), where q = 1, 2, ··· , 2000 and implementing the PAVA ong ˜(tq) with the vector w = ~1 to obtain the weighted isotonic estimateg ˆ(x) finally.

3.5 Cross-Validation Bandwidth Selection

The optimal choice of the bandwidth should depend on the data at hand. To optimize the estimation of the nonparametric regression, the cross-validation bandwidth selection also incorporates the condition of the monotone con- straint in the regression. When reviewing in the estimated objective func- tion(3.2) where g(·) is not decreasing, Z 2 ˜ gˆ(x) = argming {g(x) − g˜(x)} f(x)dx.

25 We can judge that there are two uncertain bandwidths in the formula. One comes from the kernel density function f˜(x) and the other is fromg ˜(x). The bandwidths for estimating f˜(x) andg ˜(x) should be different terms. In the previous section,g ˆ(x) can be computed by the two thousand quantile tq and the local polynomialg ˜(tq) in (3.3)

N X 2 gˆ(x) = argming {g(tq) − g˜(tq)} , q=1 subject to the non-decreasing restraint. The bandwidth for estimating the ˜ kernel density f(x) help confirm the tq. It is not as important as the other bandwidth for g(·) so the bandwidth for the kernel density function f˜(x) is not discussed in the thesis. We obtain the bandwidth for kernel density estimation by classical method direct plug-in methodology to select the band- width of a kernel density estimate. We focus on the bandwidth selection for g˜(x) by the new proposed cross-validation method, denoted as h. Therefore, in the following h refers to the bandwidth for the estimated function g(·). Considering that g(·) is non-decreasing, the new cross-validation bandwidth selection also factors in this monotone requirement to obtain the bandwidth h.

In the cross-validation method for bandwidth selection, the smoothing pa- rameter h would be selected by the data. Theg ˆh(x) is represented as follows: Z 2 ˜ gˆh(x) = argming {g(x) − g˜h(x)} f(x)dx, under the constraint that g(·) is non-decreasing. We use leave-one-out cross validation method. In the leave-one-out cross validation, the function esti- mation is trained on all the data except for one point and the prediction is estimated on that point. This leave-one-out method is used to predictg ˆh(Xi) by the subsample X1, ··· ,Xi−1,Xi+1, ··· ,Xn. That is to say, compute the

26 local polynomial estimationg ˜(Xi) of the regression function without using the i-th observation on the computation before obtainingg ˆh(Xi) through

PAVA . After obtaining all the estimationg ˆh(X), can be calculated as cross validation function CVˆ (h),

n 1 X CVˆ (h) = (ˆg (X ) − Y )2. n h i i i=1 The cross-validation method selects the bandwidth by obtaining the mini- mum estimate value of the prediction error of the regression,. Therefore, the ˆ bandwidth hg the minimizer of the cross-validation function CV (h),

ˆ ˆ hCVˆ = argminhCV (h). (3.4)

In this way, hg can be confirmed by (3.4) by finding the minimizer of the CVˆ (h). In practice, there would be overfitting problem using leave-one-out cross validation method. So deleting some marginal observations is necessary when using the leave-one-out method. In addition, the bandwidth obtained by the conventional cross-validation method may be very small so setting the minimum value is also necessary. Using this CV bandwidth selection method, we need to avoid selecting inappropriate bandwidth too small or too large.

The cross validation method for bandwidth selection here is data driven and considers the monotone constraint of g(·). To evaluate the improvement, we can evaluate the performance of the CV bandwidth by comparing it with the bandwidth of plug-in method. The plug-in method is also a widely-used and useful for bandwidth selection. The details would be presented in the simulation chapter.

27 Chapter 4

Simulation Study

4.1 Fitting the Model

As proposed about the methodology in the previous chapter, the simulations will be conducted and the outcomes will be discussed and analyzed in this chapter. The simulation is based on the pre-specified nonparametric problem (1.1).

Our objective is to estimate the unknown smooth function g(x) based on the observations (X1,Y1), ··· , (Xn,Yn). Xi is generated from a mixture nor- mal distribution,

X ∼ 0.4 × N(µ1 = 12, σ1 = 3) + 0.6 × N(µ2 = 24, σ2 = 3) with the sample size n = 300. We assume that the noise term follows a normal distribution,  ∼ N(0, 0.03), which is also applicable to the following simulations. g(x) would be set as a non-decreasing smooth function considering the study of the isotonic regression. Here is the true function in Figure 4.1. Graphically,

28 Figure 4.1: True function g(x) it is smooth, piecewise and non-decreasing.   0, if x < 5   0.1x − 0.5, if 5 6 x < 10  g(x) = 0.5, if 10 6 x < 15 (4.1)   0.05x − 0.25, if 15 6 x < 20   1, if x > 20

Figure 4.2 is a sample of scatter plot of the 300 simulated data. As shown in the figure, X and  results in the of the simulated data. The

29 randomness from X produces an unequal scatter of the data points in the horizon direction while  leads to the noise vertically.

Figure 4.2: Scatter Plot of Simulated Data (Normal Distribution)

As we discussed the methodology in the chapter 3, we fit the local poly- nomial regression estimateg ˜(x) firstly. We choose Gaussian kernel as the kernel function, which is one of the most popular kernel functions. As for the bandwidth selection for h, we choose the bandwidth by using the de- fault method, plug-in method to compare with the proposed cross-validation method. The plug-in bandwidth selection can be realized by calling dpill function in the R package KernSmooth. The dpill function uses direct plug-

30 in methodology to select the bandwidth of a local linear Gaussian kernel regression estimate(Wand, 2015).

Following the methodology in chapter 3, we transform the continuous prob- lem into the discrete problem by calculating the two-thousand quantiles from t1 up to t2000. Correspondingly,g ˜(tq) are obtained by local polynomial re- gression method. The plot of (tq, g˜(tq)) is showed in Figure 4.3 below, which represents theg ˜(x).

Figure 4.3: Local Polynomial Estimatesg ˜(x)

Clearly,g ˜(x) is not a non-decreasing function and we can see some fluc-

31 Figure 4.4: Isotonic Regression Estimatesg ˆ(x) tuations at the constant parts, at x 6 5, 10 6 x < 15, x > 25.

After that, we apply PAVA to theg ˜(tq) values so as to obtain the non- decreasingg ˆ(tq) . The plot ofg ˜(tq) versus tq is presented in the following Figure 4.4. It shows the estimated weighted continuous isotonic function, gˆ(x).

From the figure above,g ˆ(x) is a non-decreasing function, which is closer to the true function g(x). At the fluctuate parts of the curve, it is signifi- cantly improved comparing withg ˜(x).

32 4.2 Numerical Performance Comparisons

4.2.1 L1 Distance Comparison

In the similar way, we choose the bandwidth by the proposed cross-validation method in replace of the bandwidth of the plug-in method. The following steps are the same to obtain the estimated isotonic regression curve by the algorithm of PAVA. We compare the estimation performance in both band- width selection methods in the local polynomial regression and isotonic re- gression estimation. Based on the procedures mentioned in the previous section, we further generate one thousand simulations to analyze the numer- ical performance of the estimates. Since we know the true function in the simulation study, we can evaluate the performance of the estimation by cal- culating L1 distance, which is commonly used to measure the sum of the absolute differences between the estimated value and the target values.

L1 distance, also known as Manhattan distance, is a measure of straight- line distance between two points in Euclidean space. L2-norm is known as least squares. It basically minimizes the sum of the square of the differences between the target value and the estimated values. Why is L1 distance cho- sen instead of L2 distance? Simply speaking, L2 distance squares the error. Mostly the model will make a much larger error than the L1 distance if the error is not small. If an example is an outlier, the model would be affected to minimize this single outlier case, where many other common examples would be weakened. Because the errors of these common normal examples are small compared with that outlier case. So L1 distance is relatively robust. That is

33 the direct reason L1 distance is preferred here.

According to the definition, the L1 distance measure of isotonic regression estimationg ˆ(x) can be defined as: Z L1gˆ = |gˆ(x) − g(x)|dx.

The non-negative continuous function can be approximated by

N X L1gˆ ≈ |gˆ(tq) − g(tq)|(tq − tq−1). q=1

In the similar way, the L1 distance of local polynomial estimates˜g(x) can be written as N X L1g˜ ≈ |g˜(tq) − g(tq)|(tq − tq−1). q=1 As for the L1 distance, there are four cases altogether, considering two regres- sion methods by two bandwidth selection methods, namely local polynomial regression by plug-in method(L1.gtilde1), isotonic regression by cross valida- tion method(L1.gtilde2), local polynomial regression by plug-in method(L1.ghat1), isotonic regression by cross validation method(L1.ghat2).

After one thousand repetitions of the simulation, we obtain the L1 distance of the four cases, one thousand values respectively in each case. The results are summarized in the boxplot shown in the following Figure 4.5. This figure clearly presents that the isotonic regression method performs better than the local polynomial method. However, it also shows that the proposed cross val- idation method is no better than the plug-in method. For further exploring the reasons, we should check the estimations of the simulations. By check- ing the simulations and the estimations carefully, we find that the marginal estimates are not accurate. The errors are relatively large at the marginal

34 Figure 4.5: Boxplot of L1 Distance 1 (Normal Distribution) area. It also can be seen that the data points are relatively sparse around the marginal. It is not difficult to judge that the data is generated from the mixed normal distributions. According to the properties of the normal distribution, 95% data concentrates between µ−2σ and µ+2σ. Fewer points exist outside of that area. That is why sparse data is shown at the boundary of the simulation. Considering the data is sparse at the boundary, insufficient information would be provided for the regression. Granted that there is a small improvement performance in the dense area, the boundary errors are so large that they may impair the performance improvement. Therefore, the outcomes are not good.

35 To further test if the cross-validation method outperforms the local poly- nomial regression method, we could modify the simulation to truncate those very sparse data. In the next part, we will try to modify the simulations to improve the overall performance. To weaken the effects of the boundary ef- fects, we truncate the part out of µ±2σ to simulate the data. In this way, we guarantee that the boundary part is not sparse anymore. Then we can test if the new cross validation bandwidth selection method is better than the plug- in method. Still use the previous normal distribution, N1(µ1 = 12, σ1 = 3)

N2(µ2 = 24, σ2 = 3). But but we truncate those data smaller than µ−2σ and bigger than µ + 2σ so the simulated data will not be sparse at the boundary.

In the similar steps, Xi is generated from a mixed truncated normal distri- bution with the sample size of 300, shown in Figure 4.6.

Then we obtain the bandwidth in both plug-in method and the proposed cross-validation method. After confirming the bandwidth, local polynomial regression can be used to estimate. The next step is to apply PAVA tog ˜(tq) to obtaing ˆ(tq) to realize the weighted isotonic regression. Repeat this step by replacing the bandwidth calculated by the cross validation method. The last step is to calculate the L1 distance to evaluate the improvement of isotonic regression with cross-validation method. After one simulation is done, one thousand simulations are generated to test the performance of two regression methods under two bandwidth selection methods. There are four categories of the performance evaluation as before. The of the outcomes is below Figure 4.7.

36 Figure 4.6: Scatter Plot of Simulated Data (Truncated Normal Distribution)

37

Figure 4.7: Boxplot of L1 Distance 2 (Truncated Normal Distribution) 4.2.2 L1 Percentage Improvement

Based on one thousand simulations, the proposed cross-validation method for bandwidth selection is better than the plug-in method as we expected. Furthermore, the weighted isotonic regression method is a better choice than the local polynomial method undoubtedly. We can further calculate the percentage improvement of the comparisons. It is denoted in the following to calculate the improvement percentage for the isotonic regression compared with the local polynomial regression. L1 − L1 P 1 = g˜ gˆ × 100% L1g˜ Similarly, we can calculate the improvement percentage for the cross valida- tion method versus plug-in method for bandwidth selection. L1 − L1 P 2 = PIˆ CVˆ × 100% L1PI˜ In this scenario, the table for the improvement percentage is as follows.

Median Mean P1 9.63% 15.72% P2 2.5% 3.85%

Table 4.1: Improvement Percentage (Truncated Normal Distribution)4.1

From the perspective of the regression methods, the isotonic regression makes a big improvement around 10%. Wheng ˆ(x) is computed because the mono- tonic constraint is fulfilled. That is to say, more information is provided in the isotonic regression so the performance is better. Further, the CV bandwidth selection also contribute to the better performance of the estimates. From the view of the bandwidth selection methods, the proposed cross-validation pro- duces a small improvement of around 3%. Compared with plug-in method,

38 the cross-validation makes full use of the information that simulation data offers.

4.2.3 Confidence Interval

Next, we evaluate the performance by the confidence bands based on the one thousand simulations. Confidence bands usually present where the true function should be located. In the simulation study, the bounds of 95% con-

fidence band is constructed by 2.5% quantile and 97.5% quantile ofg ˆ(tq) and g˜(tq). In this way, 95% confidence bands are constructed by the bounds for the two thousand tq values. Because the confidence bands require the fixed set of horizontal ordinates, we take average of the one thousand replicates of t1, t2, ··· , t1999, t2000 to estimateg ˆ(x) in the same set of data points. The confidence intervals forg ˜(x) andg ˆ(x) are constructed in the same way. On the whole, the confidence band ofg ˆ(x) is very close to that ofg ˜(x). If consid- ering the bandwidth selection methods, the small improvement would make the curve closer to each other. Therefore, the following figure (Figure 4.8) shows the part of the curve to present the distinction. We choose to plot g˜PI (gtilde1),g ˆPI (ghat1), andg ˆCV (ghat2) to evaluate the estimation further.

In the Figure 4.8 of the confidence interval, the black solid line locates the true function. The dotted line representsg ˆPI (ghat1) and the dotdash type presentsg ˆCV (ghat2) while the red line represents that ofg ˜PI (gtilde1). An- other figure takes a closer look at the confidence bands between theg ˆPI and gˆCV . It is not difficult to spot thatg ˆCV estimates the true function more precisely. It shows that the isotonic regression outperforms the local polyno- mial regression because the former method fulfills the monotone constraint. Therefore, as can be seen from the Figure 4.8, the isotonic estimates provide

39 Figure 4.8: Partial plot of confidence bands 1 improvement especially in the constant and slightly fluctuated area compar- ing the two regression methods. As for the strictly increasing part, isotonic regression does not perform better than the local polynomial regression sig- nificantly. The reason is that the local polynomial regression already makes full use of the data in the neighborhood for estimation. While the weighted isotonic regression only satisfies an additional non-decreasing constraint but it is not necessary when the data has a confirmed increasing trend. In this situation of increasing part, the isotonic regression does not contribute too much. However, as for the bandwidth selection, a slight improvement can be seen if using the proposed cross validation selection method because it also

40 incorporates the monotone constraint. Because the improvement percentage in the bandwidth selection is relatively small, the curves of ghat1 and ghat2 almost overlaps.

In summary, the comparisons of L1 distance, L1 improvement percentage and confidence bands indicate that the isotonic regression outperforms the local polynomial regression for non-decreasing data trend and the proposed cross-validation method is better than the plug-in method when sufficient data is provided for bandwidth selection. The better performance comes from factoring in the monotone restraint in the isotonic regression and the proposed cross validation method.

4.3 Numerical performance for other simula- tions

4.3.1 Simulated Data from Uniform Distribution

In the previous section, the simulation study is carried out based on the mixture normal distribution data. The outcome of the proposed cross vali- dation method for bandwidth selection looks good after the marginal data is truncated. To weaken the effects of data points with extreme observations, we adopt uniform distribution simulation for X next because the uniform distribution data is equally distributed at the specified area. We will take the same steps to test the performance of the estimates. Here, we continue to simulate data of uniform distribution to test the performance of the im- provement. The simulated non-decreasing function g(x) and noise term  distribution are still the same. The only difference is that Xi is simulated from a uniform distribution X ∼ U(min = 5, max = 30).

41 Figure 4.9: Scatter Plot of Simulated Data from Uniform Distribution

Following the same procedures in the previous section, we obtain the follow- ing results. Figure 4.9 shows the simulated data points in the distribution of a uniform simulation.

Again, the performances of L1 distance are based on the results from 1000 simulations. The performance of the four categories’ L1 distance is showed in the figure below.

42 Figure 4.10: Boxplot of L1 Distance 3 (Uniform distribution)

As what we expect, the proposed cross validation method improves the esti- mation in both local polynomial regression and isotonic regression method. The isotonic regression method along with the proposed cross-validation method outperforms others. The improvement percentage table is as fol- lows.

Median Mean P1 1.40% 3.67% P2 5.04% 5.27%

Table 4.2: Improvement Percentage (Uniform Distribution)4.2

43 It is easy to find that in this scenario, the bandwidth selection of the proposed cross validation method still improves the performance since sufficient infor- mation is offered in the whole area. However, the isotonic regression does not improve the performance a lot like the previous scenario. This is due to the fact that the isotonic regression improves the fluctuated data region sig- nificant especially when insufficient data appear around. Since the simulated data in this scenario is equally distributed. Sufficient data is around the flat area so the local polynomial regression can obtain enough information to estimate accurately. So no big improvement is seen in the isotonic regression method.

The confidence bands also come from 1000 simulations. It corresponds to the conclusions above.

44 Figure 4.11: Partial plot of confidence bands 2

4.3.2 Numerical Performance for Mixture Uniform Dis- tribution

In this part, Xi is simulated from a mixture uniform distribution, which com- prises of three pieces of uniform distribution.

The simulated non-decreasing function g(x) and noise term  distribution remain the same.

(X1,X2, ··· ,X30) ∼ U(10, 15)

(X31,X32, ··· ,X300) ∼ 0.4 × U(3, 10) + 0.6 × U(15, 30)

45 Again, the simulation size is 1000. The outcomes of one thousand L1 distance values for the four categories are summarized in the boxplot in the figure below.

Figure 4.12: Scatter Plot of Simulated Data (Mixture Uniform Distribution)

46 Figure 4.13: Boxplot of L1 Distance 4 (Mixture Uniform Distribution)

Besides, the improvement percentage is in the table below.

Median Mean P1 1.00% 2.99% P2 4.02% 3.68%

Table 4.3: Improvement Percentage (Mixture Uniform Distribution)4.3

The confidence bands plot is in Figure 4.14.

47 Figure 4.14: Partial plot of confidence bands 3

From the figures above, we can draw the similar conclusions. Therefore, we conclude that the proposed least square cross validation selection method is effective to improve the performance of regression methods, especially the isotonic regression.

4.4 Numerical performance for other func- tions

In this section, other functions would be utilized for simulations and numer- ical performances would be examined to verify the inferences. We further

48 define two functions as   0, if x < 5   log(x/5), if 5 6 x < 10  f(x) = log(2), if 10 6 x < 20 (4.2)   log(x/10), if 20 6 x < 30   log(3), if x > 30

 exp(x/10)  , if x < 10  8  exp(1)  , if 10 6 x < 20 h(x) = 8 (4.3)  1 exp( x−10 ), if 20 x < 30  8 10 6   exp(2) 8 , if x > 30 Here are the true function f(x) and h(x).

Figure 4.15: True function f(x)

49 Figure 4.16: True function h(x)

The simulated data are from uniform distribution Xi ∼ U(5, 35) again. The noise term  distribution is still the same as the previous normal distribution  ∼ N(0, 0.03).

After one thousand simulations, the numerical performance are presented below.

50 Figure 4.17: Boxplot of L1 Distance 5

51 Figure 4.18: Boxplot of L1 Distance 6

From the performance comparisons above, it can be seen that the proposed cross-validation method performs better than the plug-in method. The L1 distance in local polynomial estimate is larger than that in isotonic regression on the whole. The isotonic regression method does a good performance in estimation the constant parts of the piecewise constant function in compari- son with local polynomial regression.

Furthermore, the improvement percentage is showed below.

52 Median Mean P1 5.09% 7.21% P2 7.37% 7.15%

Table 4.4: Improvement Percentage 4.4 for f(x)

Median Mean P1 2.45% 4.46% P2 6.71% 6.73%

Table 4.5: Improvement Percentage 4.5 for h(x)

The confidence bands illustrates the performance further.

Figure 4.19: Partial plot of confidence bands 4

53 Figure 4.20: Partial plot of confidence bands 5

In these cases, the confidence bands for cross-validation bandwidth selec- tion are thinner than that for plug-in method. Moreover, confidence bands for local polynomial estimation are slightly wider than that for isotonic re- gression. It shows that the isotonic method outperforms local polynomial method in estimating the true function. In summary, the isotonic regression with cross-validation method is closest to the true function.

4.5 Numerical performance for increasing func- tions

For simplifying the steps, we still use the simulated data X from uniform distribution in subsection 4.3.1. The increasing functions are as follows.

54 Logarithmic function: l(x) = log(x/10)

Exponential function: exp(x/10) e(x) = 30

Figure 4.21: True function l(x)

55 Figure 4.22: True function e(x)

To guarantee the simulated data points scatter reasonably around the true function, the noise term  with normal distribution varies. As for these scenarios, L1 distances are computed for the four-category methods.

56 Figure 4.23: Boxplot of L1 Distance 7

57 Figure 4.24: Boxplot of L1 Distance 8

Apparently, the isotonic regression method has no improvement in this sce- nario. However, the cross validation method still improves the performance. It can be seen in the improvement percentage section below.

Median Mean P1 0% 0.0014% P2 6.597% 5.714%

Table 4.6: Improvement Percentage 4.6 for l(x)

58 Median Mean P1 0% 0.208% P2 5.237% 3.485%

Table 4.7: Improvement Percentage 4.7 for e(x)

Confidence bands are summarized below in Figure 4.25 and Figure 4.26. As we discussed before, from what is showed in the figures, the isotonic regression has almost no improvements compared with the local polynomial regression when estimating the data with an increasing trend. However, the cross val- idation for bandwidth selection methods still contributes to the estimation. Therefore, we only focus on the improvement of the cross validation method.

Figure 4.25: Partial plot of confidence bands 6

59 Figure 4.26: Partial plot of confidence bands 7

In practice, the cross-validation bandwidth selection matters in improving the nonparametric regression. In overall evaluations of all the simulations, the proposed cross-validation bandwidth selection in the isotonic regression out- performs other situations. Undoubtedly, isotonic estimates here are obtained by updating the local polynomial estimates by PAVA. Monotone constraint is fulfilled so the isotonic regression estimates are more accurate. Where the simulation functions are strictly increasing, the PAVA has no effect so that the two regression methods are almost same but the critical points lies in the bandwidth selection.

60 Chapter 5

Application to Real World Data Set

After the detailed discussion about the bandwidth selection for the regres- sion problem, we are going to apply the proposed cross validation method on the real data. The data comes from the Data Set C.11 IPO in Appendix C(Kutner, 2005).

It is commonly known that the private companies usually issue shares of stock to go public, which is refereed to as Initial Public Offerings(IPOs). Here is a data set which consists of 482 IPOs information. The data set orig- inally consists of five variables. We focus on the relationship of face value of company and number of shares offered. On average, we have the tuition that the number of shares offered are non-decreasing as the company face value increases. So we apply the method of cross validation on the weighted isotonic regression fitting to the data set. It is reasonable to estimate the number of shares offered according to the face value of the company on the whole. Specifically, the data set is divided to two parts on the criterion whether the company obtains venture capital funding or not to do regression

61 fitting separately.

We apply the method mentioned above to two different cases, with or without venture funding. As for the presence of venture funding part, there are 212 data points for regression fitting. Because of the intuition that the number of shares offered tends to be increase with the increasing face value of the company. So the weighted isotonic regression would be a better choice for the fitting. We also use the proposed cross validation method to confirm the bandwidth. Here is the figure which presents the local polynomial regression fitting and cross validation isotonic regression fitting.

Figure 5.1: Two Regressions Comparison Case 1

For those companies that have no venture funding, 270 records can be found.

62 From the scatter plot, we can see that the data concentrates on the area with relative low face value of the company. As discussed in the previous chapter, the cross validation bandwidth selection would be better to conduct in the dense area of the data points. The marginal data should be ignored when calculating the cross validation scores to confirm the bandwidth. Otherwise the sparse data would bias the cross validation selection just like the marginal area of the normal distribution.

Figure 5.2: Two Regressions Comparison Case 2

63 Chapter 6

Conclusions

In this thesis, we propose the new cross validation method for bandwidth se- lection which considers the monotone constraint that g(·) is non-decreasing. We evaluate the estimation performance of the cross validation method by comparing with the plug-in method in the local polynomial and isotonic regression estimation via one thousand simulations. Besides, the weighted isotonic regression is used to solve the non-parametric problem by updating the classical local polynomial estimates. So the proposed method satisfies the requirement of non-decreasing monotocity. Moreover, the weighted iso- tonic regression method also fulfills the monotone constraint that g(·) is a non-decreasing smooth function. Therefore, the cross-validation method for bandwidth selection for the weighted isotonic regression outperforms other choices in the non-parametric estimates.

Furthermore, more simulations about various distributions and functions are carried on to examine the numerical performance. Based on the boxplots of L1 distance, it is clear that as for the bandwidth selection for the regression estimates, the proposed cross-validation performs better than the plug-in method if sufficient data can be used for estimation. This is due to the fact

64 that the cross-validation method makes full use of the data in the neigh- bourhood and it incorporates the non-decreasing constraint of g(x). Besides, the isotonic estimates outperforms the local polynomial estimates because the isotonic regression fulfils the monotone requirement. Therefore, the iso- tonic regression with cross-validation bandwidth selection has the best per- formance when the simulated data has the truncated normal distribution and uniform distribution. For the strictly increasing functions, isotonic regres- sion does not outperform the local polynomial regression but the proposed cross-validation bandwidth method is still better than the plug-in method.

Although the proposed cross validation method works well to some extent, there are still much research work that needs to be done in the future. The thesis only compares the numerical performance of L1 distance in two band- width selection methods. Other bandwidth selection methods can be used for comparisons of the performance. The proposed cross validation may not be the optimal choice in some certain cases. Further examination can be done to evaluate the numerical performance by using more various functions. Besides, the cross-validation method for the bandwidth selection could not apply to all kinds of simulated data distribution so that simulated data from more different distributions can be conducted. Similarly, more real data cases could be applied to test the proposed cross validation method. Moreover, the rigorous theoretic proofs can be developed because only intuitive simple ex- planations are provided in the thesis. In addition, since the constraint in the isotonic regression is non-decreasing, on the contrary if the constraint is non-increasing, the method would still be handled by slight revisions. More- over, if other constraints are listed, what improvements could be done to the proposed cross validation method? All in all, it is strongly advised that some future work could be done to improve the performance further.

65 Bibliography

Ayer, M., Brunk, H. D., Ewing, G. M., Reid, W. T., and Silverman, E. (1955). An empirical distribution function for with incomplete information. The Annals of , 26(4):641–647.

Barlow, R. E. (1972). under order restrictions: the theory and application of isotonic regression. J. Wiley, New York;London;.

Fan, J. and Gijbels, I. (1992). Variable bandwidth and local smoothers. The Annals of Statistics, 20(4):2008–2036.

Fan, J. and Gijbels, I. (1996). Local polynomial modelling and its applications, volume 66. Chapman & Hall, London, 1st edition.

Gasser, T., Kneip, A., and Kohler, W. (1991). A flexible and fast method for automatic smoothing. Journal of the American Statistical Association, 86(415):643–652.

Groeneboom, P. and Jongbloed, G. (2010). Generalized continuous isotonic regression. Statistics and Probability Letters, 80(3):248–253.

Hardle, W. and Marron, J. S. (1985). Optimal bandwidth selection in non- parametric regression function estimation. The Annals of Statistics, 13(4).

Kutner, M. H. (2005). Applied linear statistical models: Michael H. Kutner . [et al.]. McGraw-Hill Irwin, Boston, 5th edition.

66 Mammen, E. (1991). Estimating a smooth monotone regression function. The Annals of Statistics, 19(2):724–740.

Rice, J. (1984). Bandwidth choice for nonparametric regression. The Annals of Statistics, 12(4):1215–1230.

Robertson, T., Wright, F. T., and Dykstra, R. (1988). Order restricted statistical inference. Wiley, Chichester;New York;.

Ruppert, D., Sheather, S. J., and Wand, M. P. (1995). An effective band- width selector for local least squares regression. Journal of the American Statistical Association, 90(432):1257–1270.

Wand, M. (2015). KernSmooth: Functions for Kernel Smoothing Supporting Wand & Jones (1995). R package version 2.23-15.

Wand, M. P. M. P. and Jones, M. C. (1995). Kernel smoothing, volume 60. Chapman & Hall, London, 1st edition.

Watson, G. S. (1964). Smooth . Sankhy¯a: The Indian Journal of Statistics, Series A (1961-2002), 26(4):359–372.

67