<<

A new GEE method to account for ,

using asymmetric least-square regressions

Amadou Barry* 1,2, Karim Oualkacha3, and Arthur Charpentier3

1Departments of , and Occupational Health, McGill University, Montréal, Québec, Canada 2Lady Davis Institute, Jewish General Hospital, Montréal, Québec, Canada 3Department of Mathematics and , Université du Québec à Montréal, Montréal, Québec, Canada

December 29, 2020

Abstract

Generalized estimating equations (GEE) are widely used to analyze longitudinal ; how-

ever, they are not appropriate for heteroscedastic data, because they only estimate regressor

effects on the response—and therefore do not account for data heterogeneity. Here, we

combine the GEE with the asymmetric (expectile) regression to derive a new class

of estimators, which we call generalized expectile estimating equations (GEEE). The GEEE

model estimates regressor effects on the expectiles of the response distribution, which pro-

vides a detailed view of regressor effects on the entire response distribution. In addition to

capturing data heteroscedasticity, the GEEE extends the various working correlation structures arXiv:1810.09214v2 [stat.ME] 24 Dec 2020 to account for within-subject dependence. We derive the asymptotic properties of the GEEE

estimators and propose a robust estimator of its matrix for inference (see our R

package, github.com/AmBarry/expectgee). Our simulations show that the GEEE estimator is

non-biased and efficient, and our real data analysis shows it captures heteroscedasticity.

Keywords: Expectile regression, quantile regression, GEE, working correlation, cluster data, lon-

gitudinal data.

*Corresponding author: [email protected].

1 1 Introduction

Longitudinal data, which is collected at multiple occasions over a period of time, is generally preferred to cross-sectional data. However, statistical analysis of such data is challenging, for two reasons: 1. Within-subject dependence (multiple measurements for the same subject); and 2.

Heteroscedasticity.

Among the available methods for analyzing longitudinal data, generalized estimating equations

(GEE) are undoubtedly one of the most popular (Liang and Zeger, 1986) because they effectively account for within-subject dependence. To develop the GEE model—which is an extension of the generalized (Nelder and Wedderburn, 1972) in the longitudinal framework—Liang and Zeger (1986) relied on the quasi-likelihood concept (Wedderburn, 1974), and derived the

GEE estimators as the solution of the estimating equations. As a result, the GEE model is semi- parametric. They also formally included a working correlation structure in the estimating equa- tions to account for within-subject dependence in the estimation process.

Among its many attractive qualities, the GEE model offers a broad class of models for different types of responses and various working correlations to specify the within-subject dependence

(Hardin and Hilbe, 2003). Moreover, Liang and Zeger (1986) have shown that the GEE estimator is highly efficient even when the true covariance structure is misspecified. Unfortunately, despite all these attractive properties, the GEE model is inefficient for coping with heteroskedasticity—a common feature of longitudinal data.

Several past works have tried, unsuccessfully, to improve GEE using quantile regression (QR).

Farcomeni and Marino (2015) reviewed all the available studies. They concluded that none of the available research was able to adapt QR to the GEE model, because in each case, it was too challenging to preserve the GEE correlation structure in the QR framework.

More specifically, it is difficult to preserve the random error’s covariance structure in the QR framework. For example, Leng and Zhang (2014) showed that when the random noise vector for the repeated measurement from an individual followed an AR(1) structure, the corresponding noise vector for the QR was not AR(1). This is obviously undesirable, because we always want to preserve known aspects of the data in the generalization.

Consider that we would like to extend the QR covariance to the GEE covariance structure, but it is complex because the former covariance is different from the latter. Indeed, Leng and Zhang

2 (2014) could not preserve the covariance structure because the QR covariance contains a density function, but the GEE covariance does not. This difference makes it hard to extrapolate any as- sumption from the quantile to the GEE. In addition, it is hard to estimate the QR density function itself, because it is computationally intensive and subject to numerical issues (Chen et al., 2004;

Yin and Cai, 2005; Kocherginsky et al., 2005).

To the best of our knowledge, there is no model that estimates the marginal effect of the regres- sors on the response distribution, and that naturally generalizes the GEE model’s correlation structures. In this paper, we take a new approach to this problem. Instead of attempting to use quantiles, we use expectiles to successfully generalize the working correlation structure of the random error. This lets us model the GEE model in a way that accounts for heteroscedasticity

(heterogeneity) in the data.

Since it is well known that quantile models are more robust than expectiles, some may question why we are using expectiles instead of quantiles. Indeed, quantiles are more robust because they generalize the (whereas expectiles generalize the mean), and are therefore less affected by outliers. However, as mentioned above, all attempts thus far to adapt QR to the GEE model have failed (because they fail to preserve the working correlation structure of the random error).

Therefore, we think expectiles are a good alternative.

The word expectile was first coined by Newey and Powell (1987) in their seminal paper wherein they also introduced the ER. Newey and Powell (1987) presented a detailed study of this new class of estimator, including its location and scale equivariance, and used the expectile regression

(ER) coefficient estimators to test for the presence of heteroscedasticity. Newey and Powell (1987) also advocated the ER model as an alternative to the QR model introduced earlier by Koenker and Bassett (1978); and highlighted some advantages of the ER over the QR . In the same pe- riod, Aigner et al. (1976) showed that the ER estimators can be derived under a likelihood-based approach using a normal density function with unequal weight on the positive and negative random errors.

Today, there is a growing interest in the ER topic as demonstrated by the number of publications in the literature. The ER has been extended to many other classes of models, such as Bayesian

(Majumdar and Paul, 2016; Waldmann et al., 2016; Xing and Qian, 2017), nonparametric (Righi et al., 2014; Yang and Zou, 2015), nonlinear (Kim and Lee, 2016), neural network (Xu et al., 2016;

Jiang et al., 2017; Lin et al., 2020), support vector machine (Farooq and Steinwart, 2017), and splines (Schulze Waltrup and Kauermann, 2015). Moreover, because of its computational advan-

3 tage, the ER has now become an important in high-dimension (Zhao et al., 2018,

2019; Xu et al., 2020; Pan et al., 2020; Pan, 2020; Liao et al., 2019; Barry et al., 2020), to name a few.

In this paper, we consider the ER and propose a Generalized Expectile Estimating Equations

(GEEE) model that retains the attractive properties of the GEE model, while accounting for het- eroskedasticity in the longitudinal data. We derived its asymptotic properties and proposed a heterogeneous consistent and robust estimator of its -. We have made a free R package available on GitHub (github.com/AmBarry/expectgee) to simplify its imple- mentation. Our GEEE model estimates the regressor effects at the conditional expectile of the response distribution without making any assumption about the random error distribution. In addition, the GEEE inherits the attractive properties of the classic GEE, in that it provides a highly efficient and even if the true covariance structure is misspecified.

Our main contributions are: 1. Deriving the asymptotic properties of the new GEEE model; 2.

Showing that the GEEE model can generalize the various covariance structures available in the

GEE framework—in other words, showing that the GEEE model offers a variety of correlation structures to flexibly model the correlation of the within-subject observations; 3. Proposing the variance-covariance matrix for the GEEE estimator; and 4. Generalizing two selection criteria

(namely the QIC and CIC, proposed by Pan (2001) and Hin and Wang (2009), respectively).

Our model works because in the presence of heteroskedasticity, the GEEE’s estimated parameters depend on the expectiles. Thus, by estimating several regression coefficient vectors at different locations of the conditional response distribution, it is possible to study the data’s heteroskedas- ticity in depth. In addition to accounting for the presence of heteroscedasticity in the data, our

GEEE model captures regressor effects in the location, scale, and shape of the response distribu- tion. The GEEE model is computationally efficient and easy to implement, and we believe it will be a useful instrument in the toolkit of researchers interested in analyzing longitudinal data.

Figure 1 highlights the usefulness of our GEEE model. On the left, a classical GEE mean regres- sion is fitted to the data, as indicated by the solid line. As you can see, this single line cannot capture many of the data points. On the right, we have an example of our GEEE expectile re- gression model (with an independent working correlation). As you can see, the five GEEE fitted models (τ ∈ (0.1, 0.25, 0.5, 0.75, 0.9)) are able to capture many more data points in this het- eroscedastic sample.

More specifically, the left and right panel of Figure 1 display the same scatterplot. A classical

4 mean regression is fitted to the data and represented by the fitted line in Figure 1.a. Although

the mean regression parameter estimator is unbiased, the estimation of the

alone does not capture the dispersion or the scale shifts caused by changes in the regressor, as

suggested by the plot in Figure 1.a. In contrast, the ER model by estimating different conditional

expectiles of the response distribution is able to capture the heteroscedasticity present in the data

(Figure 1.b). Thus, similar to the ER model, the GEEE model by estimating the conditional expec-

tiles of the response distribution will capture the location and scale shift and other unobserved

heterogeneity of the data.

a b τ 0.1 0.25 0.5 0.75 0.9

15

15

10

10 y

5 y 5

0 0

0 25 50 75 100 0 25 50 75 100 x x

Figure 1: Comparison of the GEE model (left) and GEEE model (right) for a heteroscedastic sample. Figure a shows a GEE mean regression line fitted to the data, and Figure b shows the five GEEE regression lines, τ ∈ (0.1, 0.25, 0.5, 0.75, 0.9), fitted to the data. The sample (n = 90) is generated from a heteroscedastic linear model, y = 6 + 0.025x + ε, where ε ∼ N (0, σ2) and σ = 1 + 0.05x.

In section 2, we define the expectile and introduce the ER for the cross-sectional data. We then introduce the Generalized Expectile Estimating Equations (GEEE) model and present an algo- rithm showing the extension of some common correlation structures. In section 3, we present the asymptotic properties of the GEEE estimator, as well as a robust estimator of its covariance matrix. In section 4, we conduct simulations to evaluate the estimator’s performance on small samples. In section 5, we apply the GEEE model to a real data set (a for a new labor pain medication) to uncover the unobserved heterogeneity of the data and capture the

5 heterogeneity of the regressor effects. Section 6 concludes. Please note that theorem proofs are provided in the supplementary material.

2 Models and Methods

This section introduces the univariate expectile and the ER model.

Notations. Vectors are written in lower bold letters, x ∈ Rp, and matrices are represented in capital bold letters, X ∈ Rn×p. Estimated quantities are represented with a hat symbol Xb , the −1 T inverse of a squared matrix is noted as X and the transposed matrix is X . The symbols Ip×p and Ip represent the identity matrix and are noted without subscript (I) when the dimension is implicitly known. The symbol 1(t < 0) is the indicator function and is equal to 1 if t < 0 and to 0 otherwise. The vectors 1 and 0 are constant vectors filled with 1s and 0s, respectively. The function k·k∞ is the max norm and the l2 norm is noted as k·k2 or k·k . The symbol ⊗ represents the Kronecker product.

2.1 Expectile and expectile regression

The expectile of a continuous Y is defined as the solution µτ(Y) that minimizes the

E{ρτ(Y − θ)} (1)

over θ ∈ R for a fixed value of τ ∈ (0, 1). The function ρτ(·) of the form

2 ρτ(t) = |τ − 1(t ≤ 0)| · t is the asymmetric square loss function that assigns weights τ and 1 − τ to positive and negative deviations, respectively.

By equating the first derivative of (1) to zero, the expectile can also be defined as solution of

6 1 − 2τ µ (Y) = µ = µ − E {Y − µ (Y)}1{Y > µ (Y)}, (2) τ τ 1 − τ τ τ

where µ = µ0.5(Y) = E(Y). This definition, presented by Newey and Powell (1987), shows that the expectile is determined by the tail expectations of the distribution of Y. Interestingly, the

expectile can also be defined as

" # ψτ(Y − µτ) µτ = E  Y , E ψτ(Y − µτ)

where ψτ(t) = |τ − 1(t ≤ 0)| is the check function. This latter definition, which is much more meaningful in the context of regression, reveals that expectiles, like the mean, are weighted aver- ages.

n Given a random sample, {(yi)}i=1, the τ-th empirical expectile

n ψ (y − µ ) µ = τ i bτ y bτ ∑ n ( − ) i i=1 ∑i=1 ψτ yi µbτ

is the solution that minimizes the empirical loss function

1 n ∑ ρτ(yi − θ). (3) n i=1

Newey and Powell (1987) have shown that the expectile function has attractive properties. There

is a one to one function between the expectile and the c.d.f. of the random variable. That is,

the expectile function summarizes the c.d.f. of the random variable Y, but differently than the

quantile . Indeed, the quantile is an while the expectile is a weighted mean.

Moreover, the expectile is location and scale equivariant, that is, for s > 0 and t ∈ R, µτ(sY +

t) = sµτ(Y) + t. More details about the expectile properties and the ER model are given by Efron (1991).

To introduce the ER method, consider the classical

T yi = xi β + εi, (4)

7 p p where yi is the scalar response, εi the random error, xi ∈ R the vector of covariates and β ∈ R the unknown parameter that needs to be estimated. Under this framework, Newey and Powell

(1987) specified the ER model for a fixed τ ∈ (0, 1) as:

T µτ(yi|xi) = xi βτ, µτ(εi) = 0. (5)

The assumption, µτ(εi) = 0, ensures that the random error is centered on the τ-th expectile. The corresponding ER estimator is defined as the unique solution that minimizes the objective function

n 1 T ∑ ρτ(yi − xi βτ) (6) n i=1

p over βτ ∈ R . The parameter βτ measures the regressor effects and varies according to the value of τ in the presence of heteroscedasticity. In general, the choice of τ depends on the re-

search question. For instance, if the interest is in studying the low birth weight risk factors,

then the focus will be on the regressor effects at the extreme left of the response distribution, say

τ ∈ (0.05, 0.1, 0.2). Regardless of the research question, Newey and Powell (1987) suggested a

hypothesis testing procedure for detecting heteroscedasticity based on systematically estimating

the regressor effects on a sequence of expectiles that spans the of the response distribution

and conducting multiple testing. This approach, in addition to testing for the presence of het-

eroscedasticity, will provide a comprehensive and detailed overview of the regressor effects on

the response distribution.

The asymmetric loss function associated with the expectile function is continuously differentiable

and solving equation model (6) gives:

n −1 n  T    βbτ = ∑ xi ψτ(bεi)xi ∑ xiψτ(bεi)yi , (7) i=1 i=1

T where bεi = yi − xi βbτ. The ER estimator is easily computed using the iterated reweighted least squares (IRLS) algorithm. In addition to deriving the asymptotic properties of the above ER esti-

mator, Newey and Powell (1987) proposed a robust estimator of the variance-covariance matrix

for inference.

8 Note that the ER estimator can be estimated through a likelihood-based approach. Indeed, as-

2 sume that the disturbance, u ∼ AND(u; µτ, σ , τ), follows an asymmetric normal distribution (AND)

p ( !) 2 2 τ(1 − τ) u − µτ f (u; µτ, σ , τ) = √ √ √ exp − ρτ , (8) πσ2 τ + 1 − τ σ

where µτ, σ, and τ are the location, scale and asymmetric parameters, respectively. Now, sub- T stitute µiτ = xi βτ and assume that the n observations are independent. Then, for a fixed τ, the ER estimator is equivalent to the maximum of the

( n T !) −2n yi − xi βτ L(β; σ, τ, y) ∝ σ exp − ∑ ρτ , i=1 σ

T where y = (y1,..., yn) . As mentioned by Waldmann et al. (2016), the derived likelihood is technically not a likelihood but rather an auxiliary likelihood since it is not assumed to describe

the exact or even approximately correct data distribution.

2.2 GEEE for longitudinal data

This section presents the model and method of the GEEE for longitudinal data analysis. Consider { } that the data yit, xit 1≤i≤n,1≤t≤mi are generated by the following model

T yit = xit β + εit, (9)

where yit is the t-th observation of the continuous response variable for the i-th individual, 1 p T xit = (xit,..., xit) is the p × 1 vector of regressors, εit the random error and β is the p × 1 true parameter vector that needs to be estimated. By grouping observations from the same indi-

vidual, equation model (9) can be conveniently represented as

yi = Xi β + εi, (10)

where yi is the vector response of individual i, Xi is the corresponding regressor matrix of di-

9 mension mi × p and εi the error vector. The individual observations can also be stacked and presented in matrix form as

y = Xβ + ε, (11)

n where y and ε are N × 1 vectors, X is N × p matrix, and N = ∑i=1 mi.

Using the location-scale equivariance property of the expectile function, we introduce the corre- sponding conditional τ-expectile model as:

T µτ(yit|xit) = xit βτ, µτ(εit) = 0. (12)

The assumption, µτ(εit) = 0, is introduced to guarantee that the random error is centered on the τ-th expectile. The parameter βτ measures the effect of the regressors on the expectile of the response variable distribution for a fixed τ ∈ (0, 1). Thus, we can estimate the regressor effects on the response distribution by estimating the GEEE model for a sequence of expectiles

(τ1,..., τq). In this way, the GEEE captures the regressor effects on the location, scale and shape of the response distribution.

A practical parameter estimator can be obtained by looking for the solution of the following expectile estimating equations:

  n ∂µτ yi|Xi h i SI (βτ) = ∑ Ψτ(yi − Xi βτ) yi − µτ(yi|Xi) = 0, (13) i=1 ∂β

  ( − ) = ( − T ) ( − T ) where Ψτ yi Xi βτ diag ψτ yi1 xi1 βτ ,..., ψτ yimi ximi βτ . The resulting estimator,

βbIτ, can also be derived as the minimizer of the following objective function:

n mi 1  T  ∑ ∑ ρτ yit − xit βτ (14) N i=1 t=1

p over βτ ∈ R . The explicit form of the resulting estimator (βbIτ) is similar to the classical ER

estimator defined in equation (7). When τ = 0.5, the estimator βbIτ corresponds to the GEEE estimator introduced by Liang and Zeger (1986) with an independent working correlation struc-

10 ture. This fact is exploited to extend the GEE to the Generalized Expectile Estimating Equations

(GEEE).

The GEEE method models the underlying correlation structure from the same subject by formally

including a hypothesized structure to account for the within-subject correlation. For a fixed τ,

the GEEE estimator βbτ is derived by solving the following GEEE equations

  n ∂µτ yi|Xi h i ( ) = −1 ( − ) − ( | ) = S βτ ∑ V iτ Ψτ yi Xi βτ yi µτ yi Xi 0, (15) i=1 ∂β

where V iτ is a working covariance matrix represented as

1 1 2 2 2 V iτ = στ Aiτ Ri(ατ)Aiτ, (16)

2 and στ is the nuisance parameter. Aiτ is the mi × mi diagonal matrix with the   ν µτ(yi|Xi) as the diagonal elements, and Ri(ατ) is the working correlation matrix.

The working correlation matrix Ri(ατ) describes the within-subject correlation pattern along the

K × 1 vector parameter ατ. Liang and Zeger (1986) proposed several types of working correlation structures (independent, exchangeable, autoregressive, unstructured, etc.) for the case when

τ = 0.5. These working correlations are adapted and extended to the GEEE approach. The

extension of some of the most common and popular ones are presented below.

The independent GEEE working correlation structure (Ind) is the simplest form of working cor-

relation with the identity matrix and is the structure assumed by the expectile estimating equa-

tions model presented in equation (13). The exchangeable GEEE correlation structure (Exc) is

a simple extension of the independence working correlation. It assumes a common correlation,

ρtsτ = ατ, ∀t 6= s, between any pair of measurements. The autoregressive GEEE (AR(1)) corre- lation defines the correlation of a pair of observations as a decreasing function of their distance in |t−s| time, ρtsτ = ατ . This structure assigns the highest correlation to adjacent pairs of observations and the lowest correlation to distant pairs. The unstructured GEEE correlation structure, as its

name suggests, imposes no structure to the correlation matrix and defines the correlations of all

pairs of measurements differently without any explicit pattern, ρtsτ = αtsτ ∀t 6= s.

All these types of working correlation are usually unknown and must be estimated. They are

11 estimated in the iterative fitting process using the current value of the parameter vector. Indeed, the estimators can be computed as an iterated reweighted least squares estimators. The algo- rithm for the exchangeable GEEE working correlation structure is summarized by the stepwise procedure of Algorithm 1.

Algorithm 1: GEEE algorithm (0) Input: Let βeτ ← βbIτ, where βbIτ is the estimator defined by equation (13).

(r) (r−1) while βbτ − βbτ ≤ ζ do ∞ (r−1) Given βeτ at the (r − 1)-th step, update:

2(r) 1 n mi 2 2 1. bστ ← N−p ∑i=1 ∑t=1 ψτ(bεitτ) bεitτ

(r) ← 1 n mi ( ) ( ) = 1 n ( − ) 2. bατ 2(r) ∑i=1 ∑t

(r) (r−1) h (r−1) i−1 (r−1) ← + n T −1( (r−1)) ( ) ( (r−1) ) 3. βbτ βbτ ∑i=1 Xi V iτ bατ Ψτ βbτ Xi S bατ , βbτ

end

Return βbτ

The parameter ζ is the convergence tolerance and the default value in our code implementation is set to 10−7. Algorithm 1 is an IRLS algorithm and its convergence has been shown by Burrus et al.

(1994). Furthermore, Jiang et al. (2007) has shown, under mild conditions, the convergence of an iterative estimating equations (IEE) procedure, which includes Algorithm 1. Additionally, the

IEE algorithm converges at an exponential rate and its estimator is consistent and asymptotically efficient (Jiang et al., 2007).

In practice, Algorithm 1 is computationally efficient and usually the number of iterations re- quired to achieve convergence is between 3 and 5. We have never encountered convergence issues in our simulation studies and real data analysis. Nevertheless, like the unrestricted GEE correlation structure, the estimated unrestricted GEEE correlation structure matrix is not guaran- teed to be invertible and numeric problems may be encountered (Hardin and Hilbe, 2003).

Notice that the convergence criterion defined as the maximum absolute relative change in the parameter estimators, at each iteration, is implemented in the first GEE software (Karim and

Zeger, 1989) and in standard statistical software such as the geepack R package (Halekoh et al.,

2006; Yan and Fine, 2004; Yan, 2002) or the GEE SAS procedure (SAS Institute Inc., 2003). We also tested the algorithm based on the relative change in the objective function. Both convergence

12 criteria lead to the same results, but the algorithm based on the parameter estimates relative

change is computationally faster.

Algorithm 1 also applies to other types of working correlation; one can simply choose the appro-

priate estimator of the parameter α which is either a scalar or a vector, depending on the type

of correlation structure. For example, for a autoregressive GEEE (AR(1)) working correlation structure, the scalar parameter ατ is estimated by

1 n n ατ = ψτ(εitτ)εitτψτ(εi,t+1,τ)εi,t+1,τ, N2 = (mi − 1). b (N − p)σ2 ∑ ∑ b b b b ∑ 2 bτ i=1 t

For a unstructured GEEE working correlation structure, every element of the mi(mi + 1)/2-vector

parameter ατ is estimated by

1 n = ( ) ( ) bαtsτ 2 ∑ ψτ bεitτ bεitτψτ bεisτ bεisτ. (N − p)bστ i=1

Generalization to other GEEE working correlations is straightforward.

In Section 3, it is shown that the GEEE estimator βbτ is consistent and asymptotically normally distributed. In addition, the simulation results in Section 4 show that the GEEE method yields

a consistent and highly efficient estimator even with a misspecification of the true covariance

structure.

2.3 GEEE for a sequence of expectiles

A sequence of expectiles is often necessary, usually the mean and a few other expectiles above

and below the mean, to describe the regressor effects on the response distribution. Additionally,

the simultaneous estimation allows them to share strength among each other and to gain better

estimation accuracy than individually estimated ones (Liu and Wu, 2011). For a fixed sequence

of expectiles, τ = (τ1,..., τq), the GEEE estimating functions are defined as

13 q S(β ) = S (β ) τ ∑ τk τk = k 1 (17) n  h i T −1 1 I 1 I = ∑(W ⊗ Xi) V iτ Ψτ q ⊗ yi − ( q ⊗ Xi)βτ q ⊗ yi − ( q ⊗ Xi)βτ , i=1

= [ ( )]q × where Sτk is defined in (15), and W diag wk k=1 is the q q matrix of weights controlling V = [ (V )]q × the relative influence of the q expectiles. iτ diag iτk k=1 is a qmi qmi block-diagonal × working covariance matrix. For any fixed τk, the expression of the mi mi matrix V iτk is function of the nuisance parameters (στk , ατk ) and is given by equation (16). The diagonal matrix of check or influence functions, Ψτ, is defined as follows:

    Ψ 1 ⊗ y − (I ⊗ X )β = diag Ψ (y − X β ),..., Ψ (y − X β ) . τ q i q i τ τ1 i i τ1 τq i i τq

The parameter β = (β ,..., β )T is obtained using the iterative reweighted least squares al- τ τ1 τq gorithm, as shown above, for a single expectile.

In general, the choice of the relative weight W is guided by the choice of the expectiles, which in turn depends on the research question. That being said, we suggest estimating the regressor ef- fects at different expectiles of the response distribution and giving more weight to the expectiles of interest. Koenker (2004) suggested to select the weights and the associated expectiles analo- gously to the choice of discretely weighted L-statistics. For example, using Tukey’s trimean we would assign weights w = (0.25, 0.5, 0.25) to the expectiles τ = (0.25, 0.5, 0.75).

In the next section, the asymptotic properties of the GEEE estimator are presented for a sequence of expectiles.

3 Asymptotic properties

This section presents the asymptotic properties of the GEEE estimator for several fixed expec- tiles τ. In the first step, the asymptotic properties of the GEEE estimator βbIτ with the indepen- dent working correlation structure are presented. Subsequently, the asymptotic properties of the

GEEE estimator βbτ with a general correlation structure are derived. The main reason for pre- senting the results of βbIτ separately is that it is also the estimator of the expectile regression for

14 a marginal model. In the following section, we assume that n → ∞, and m = max1≤i≤n mi is fixed. The proofs of the theorems are available in the supplementary material.

3.1 Asymptotic properties for the independent GEEE

To begin, we assume that the following conditions are met.

n A1. The data {(yi, Xi)}i=1 are independent across i, and

h i h i   V ( ) = E ( ) T ( ) = = T T T ar Ψτ εiτ εiτ Ψτ εiτ εiτ εiτ Ψτ εiτ Σiτ, where εiτ εiτ1 ,..., εiτq h iq = ( )T = − T ( ) = ( ( )) εiτ εi1τ ,..., εimiτ , εitτ yit xit βτ and Ψτ εiτ diag Ψτk εiτ . k k k k k k k=1

A2. The limiting forms of the following matrices are positive definite:

n −1 T DI1(τ) = lim N (W ⊗ Xi) E[Ψτ (εiτ )](Iq ⊗ Xi), n→∞ ∑ i=1

n −1 T DI0(τ) = lim N (W ⊗ Xi) Σiτ (W ⊗ Xi). n→∞ ∑ i=1

A3. There exists a positive constant M such that max 1≤i≤n kxitk < M. 1≤t≤mi

Assumptions A1-A3 are standard assumptions for longitudinal models. Condition A1 ensures independence across individuals, but permits a within-dependency between observations of the same individual, and allows heterogeneity across individuals. Condition A2 is a standard full rank condition. Observe that when τ = 1/2, then Σi0.5 = 1/4 Var[εi0.5] becomes the variance of εi up to a factor and

n −1 T DI0 = 1/4 lim N Xi Var[εi0.5]Xi. n→∞ ∑ i=1

Considering

15 n −1 T DI1 = 1/2 lim N Xi Xi, n→∞ ∑ i=1

we see that this factor disappears in the expression of the variance. Therefore, when τ = 1/2,

the condition A2 is reduced to a condition on the matrices

n n −1 T −1 T N ∑ Xi Var[εi]Xi and N ∑ Xi Xi. i=1 i=1

Condition A3 is important both for the convergence and for the Lindeberg condition. The follow-

ing Theorem states the results of the asymptotic properties of the independent GEEE estimator

(βbIτ ) assuming an independent working correlation structure.

Theorem 1. Assume that βbIτ is the solution of the estimating function (13) and suppose the data are gen- E| ( )|4+ν < E| |4+ν < erated by model (9) and that conditions A1-A3 are satisfied. If ψτk εitτk ∆ and εitτk ∆

for some ν > 0 and ∆ > 0, then for every fixed sequence of expectiles τ = (τ1,..., τq),

√    d −1 −1 N βbIτ − βτ −→N 0, DI1 (τ)DI0(τ)DI1 (τ) .

To use this new estimator βbIτ to make inference, an estimator of its VC-matrix is needed and a robust one is presented in Theorem 2. This will make it possible to construct large sample

confidence intervals or conduct hypotheses testing. The estimator presented in Theorem 2 is a

generalization of the robust VC estimator proposed by White (1980) and used in, among other

things, multilevel analysis (Liang and Zeger, 1986). This estimator inherits the same property

that is, it accounts for the within-subject correlation and the heteroscedasticity between subjects.

In summary, the proposed VC-matrix estimator is a commonly advocated covariance matrix es-

timator for longitudinal data. To state the theorem, we introduce the following matrices:

n −1 T Db I1(τ) = N ∑(W ⊗ Xi) Ψτ (bεiτ )(Iq ⊗ Xi), i=1

n −1 T Db I0(τ) = N ∑(W ⊗ Xi) Σbiτ (W ⊗ Xi) i=1

T where Σbiτ = Ψτ (bεiτ )bεiτbεiτ Ψτ (bεiτ ) and bεiτ is obtained by replacing βτ with βbIτ in the expres-

16 sion of εiτ. Then, we have Theorem 2.

Theorem 2. Suppose the data are generated by model (9) and that conditions A1-A3 are satisfied. If E| ( )|4+ν < E| |4+ν < > > ψτk bεitτk ∆ and εitτk ∆ for some ν 0 and ∆ 0, then for every fixed sequence of expectiles τ = (τ1,..., τq),

−1 −1 p −1 −1 Db I1 (τ)Db I0(τ)Db I1 (τ) −→ DI1 (τ)DI0(τ)DI1 (τ).

3.2 Asymptotic properties for the general GEEE estimator

After presenting the asymptotic properties of the GEEE-independent working correlation esti- mator, this subsection presents the asymptotic properties of the GEEE-estimator for a general working correlation. Assume that the following hold.

n h i B1. The data {(yi, Xi)}i=1 are independent across i and Var Ψτ (εiτ )εiτ = Σiτ.

B2. The limiting forms of the following matrices are positive definite:

n −1 T −1 D1(τ) = lim N (W ⊗ Xi) V E[Ψτ (εiτ )](Iq ⊗ Xi), n→∞ ∑ iτ i=1

n −1 T −1 −1 D0(τ) = lim N (W ⊗ Xi) V ΣiτV (W ⊗ Xi). n→∞ ∑ iτ iτ i=1

B3. There exists a positive constant M such that max 1≤i≤n kxitk < M. 1≤t≤mi

The following theorem derives the asymptotic properties of the GEEE estimator with a general working correlation under the above conditions.

Theorem 3. Suppose the data are generated by model (9) and that conditions B1-B3 are satisfied. If E| ( )|4+ν < E| |4+ν < > > ψτk εitτk ∆ and εitτk ∆ for some ν 0 and ∆ 0, then for every fixed sequence of expectiles τ = (τ1,..., τq),

√    d −1 −1 N βbτ − βτ −→N 0, D1 (τ)D0(τ)D1 (τ) .

17 In the same way as with the GEEE-independent working correlation estimator, the next Theorem

proposes an estimator of the VC-matrix of βbτ. Consider

n −1 T −1 Db 1(τ) = N ∑(W ⊗ Xi) Vb iτ Ψτ (bεiτ )(Iq ⊗ Xi), i=1

n −1 T −1 −1 Db 0(τ) = N ∑(W ⊗ Xi) Vb iτ ΣbiτVb iτ (W ⊗ Xi), i=1

T where Σbiτ = Ψτ (bεiτ )bεiτbεiτ Ψτ (bεiτ ). The estimated working correlation Vb iτ = V iτ (bα) is as- sumed to be a consistent estimator of V iτ. This assumption is equivalent to assuming the exis- tence of a consistent estimator, like the estimator, of the parameter α. In the following theorem, we state the consistency result of our robust covariance estimator.

Theorem 4. Suppose the data are generated by model (9) and that conditions B1-B3 are satisfied. Assume E| ( )|4+ν < E| |4+ν < > > ψτk bεitτk ∆ and εitτk ∆ for some ν 0 and ∆ 0. Then for every fixed sequence of expectiles τ = (τ1,..., τq), and under the above conditions,

−1 −1 p −1 −1 Db 1 (τ)Db 0(τ)Db 1 (τ) −→ D1 (τ)D0(τ)D1 (τ).

Theorems 3 and 4 allow for hypothesis testing about the population values of the parameter,

βτ. For instance, the presence of heterogeneous regressor effects can be detected by testing for differences in the vector of the slope coefficients across different expectiles.

Notice that by replacing the general correlation matrix with the identity matrix, the GEEE esti- mator proposed in Section 3.2 is the same as the independent GEEE estimator in section 3.1.

4 Simulations

4.1 Model design

In this section, the small sample performance of the GEEE estimators is evaluated through exten- sive simulation studies. The random samples are generated from the following linear model:

18 Mγ : yit = β0 + x1itβ1 + x2itβ2 + (1 + γx2it)εit, i = 1, . . . , n and t = 1, . . . , mi. (18)

Two versions of equation model (18) are considered with respect to the parameter γ ∈ {0, 3/10}.

A location-shift model (M0) corresponding to γ = 0, which assesses the performance of the es- timators for an homoscedastic scenario; and a location-scale-shift model (M3/10) corresponding to γ = 3/10, serving to assess the performance of the estimators in the presence of heteroscedas-

ticity.

The corresponding GEEE model for (M0) is: µτ(yit) = β0τ + x1itβ1 + x2itβ2, where β0τ =

β0 + µτ(εit). In this model only the intercept varies with τ and the expectile functions are parallel lines. In the location-scale-shift model (M3/10), the GEEE model is specified as: µτ(yit) = β0τ +

x1itβ1 + x2itβ2τ, where β0τ = β0 + µτ(εit ) and β2τ = β2 + γµτ(εit). Therefore, in the presence of heteroscedasticity, both the intercept and the parameter of x2 will vary according to τ, while the parameter of x1 remains constant.

We generated a between-subject regressor (x1) from a binomial distribution B(1, 0.5) and a within- subject regressor (x2) from a Gaussian distribution. The parameters β0, β1 and β2 were set to (0.7, 0.4, 1.2), respectively. Finally, we generated the random error of equation model (18) from three distinct distributions: Normal (N (0, 1)), Student with three degrees of freedom (T3) and Chi-squared with three degrees of freedom (χ2(3)). We used the R package copula (Kojadi- novic and Yan, 2010) to simulate the dependency between measurements of the same subject.

We started by simulating a dependent uniform margins from a Gaussian copula with an AR(1) correlation structure. Then, we generated the dependent random errors as quantiles of the uni- form margins from the three distinct marginal distributions (N (0, 1), T3, χ2(3)). Specifically, we generated the data as follows:

1. Generate the predictors x1 and x2;

2. Generate a uniform sample: (u1,..., umi ) from a Gaussian Copula with an AR(1) correla- tion structure and a specific correlation parameter ρ;

0 −1 3. For t = 1, . . . , mi, generate the dependent random error εit = F (uit), where F(.) is one of

the three marginal distributions (N (0, 1), T3, χ2(3)); and

4. Generate the response variable: yit = β0 + x1itβ1 + x2itβ2 + (1 + γx2it)εit.

19 We used three different values for the correlation parameter: low (ρ = 0.1), medium (ρ = 0.5),

and high (ρ = 0.9) correlations. We generated each model according to three different sample

sizes n ∈ {50, 100, 250}. Finally, for the number of repeated measurements mi, a balanced design with mi = 4 and an unbalanced design were studied. In the unbalanced design, mi is a number randomly generated between 3 to 7 with equal probability. The extensive simulation was carried out with 400 replications for each parameter-combination scenario. In each scenario, the focus was on the effect of the regressors at the expectiles τ ∈ {0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9}.

All computations were performed with the R (v4.0.1) statistical programming language (R Core

Team, 2018). The implemented R package expectgee that comes with this manuscript is publicly available on GitHub at github.com/AmBarry/expectgee.

4.2 Performance measures

We fitted the simulated samples using the GEEE model with three working correlation structures: independence (Ind), exchangeable (Exc) and AR(1) correlations, and studied their performance in terms of bias, relative efficiency and . We reported the bias distribution of the estimators as an error plot (the mean and the range of the distribution). We evaluated the relative efficiency (Reff) by the ratio of the standard errors and by taking the GEEE with an

AR(1) correlation structure (which is the true correlation structure) as denominator. The relative efficiency (Reff) of model M ∈ (Ind, Exc, AR(1)) was defined as:

SEM(βkτ) ReffM = , k ∈ {0, 1, 2}. SEAR(1)(βkτ)

1 400 (j) where SEM(β ) = ∑ σ and σ is the estimated standard error of β . kτ 400 j=1 bβkτ bβkτ kτ

We also evaluated the performance of the asymptotic standard error (SE) presented in Theorem

4 by reporting the distribution of the ratio between the asymptotic standard error (SE) and the

Monte Carlo (SD) defined as:

400 1  ( ) 2 2 ( ) = j − ∈ { } SDM βkτ ∑ βbkτ βbkτ k 0, 1, 2 , 400 j=1

1 400 (j) where βbkτ = 400 ∑j=1 βbkτ .

20 Since the efficiency of the GEEE parameter estimates depended on the accuracy of the correlation

structure, it was important to select the most accurate correlation structure. Numerous criteria

to select the best working correlation structure have been proposed and there was no criterion

that seems to work better than others. Therefore, in this manuscript we relied on two popular

selection criteria: the quasilikelihood under the independence model information criterion (QIC)

and the correlation information criterion (CIC) to select the best working correlation structure and to evaluate the goodness-of-fit of our GEEE model. The QIC criterion developed by Pan

(2001) is an adaptation of the Akaike Information Criterion for the GEE model and is defined as:

  QIC(R) = −2Q(βb(R), I, ∆) + Trace Ωb I(βb(R))Vb(βb(R)) , where Q is the quasilikelihood function (Wedderburn, 1974) under independence assumption of

2 the dataset ∆. In our case Q = −1/2 ∑it bεit. The matrix Ωb I was the inverse of the variance matrix obtained by fitting an independence model and Vb was the sandwich variance estimator under the working correlation structure presented in Theorem 4. Hin and Wang (2009) argued that the

first term in the QIC criterion which is the quasilikelihood under the independence assumption

was not informative in the selection of the covariance structure and proposed using the second

term alone for the selection of the working correlation structure. They called it CIC and defined

it as:

  CIC(R) = Trace Ωb I(βb(R))Vb(βb(R)) .

Højsgaard et al. (2005) reported that the CIC criterion was more convenient compared to the QIC

criterion, particularly when the model for the mean did not fit well the data and when the models

with different correlation structures were compared.

We also defined an asymmetric version of these criteria and took their sum as a new criterion.

The corresponding asymmetric version of the QIC and the CIC were defined as:

q q (R) = (R ) (R) = (R ) asymQIC ∑ QICτk τk and asymCIC ∑ CICτk τk k=1 k=1

where q is the number of selected expectiles. In this manuscript we selected 9 expectiles for

the simulation and the real data application. The asymmetric measure QICτ and CICτ defined

21 at τ ∈ (0, 1) correspond to the classical QIC and CIC computed with the parameter estimate

(βbτ) and the working correlation Rτ of the asymmetric model, respectively. Notice that when

τ = 0.5, the asymmetric measures QICτ and CICτ corresponded to the classical QIC and CIC, respectively.

The asymmetric criteria asymQIC and asymCIC selected in majority the true correlation structure

(AR(1)). Furthermore, the asymQIC and asymCIC measures performed better than the QIC and

CIC measures in the presence of heteroscedasticity, and when the error distribution was heavy-

tailed, see the supplementary material.

ar1 exc ind

A 0.1 0.5 0.9 0.6

0.3 50 0.0

−0.3

−0.6

0.6

0.3 100

2 0.0 ^ β

−0.3

−0.6

0.6

0.3 250

0.0

−0.3

−0.6 0.5 0.6 0.7 0.8 0.9 0.5 0.6 0.7 0.8 0.9 0.5 0.6 0.7 0.8 0.9 τ

B 0.1 0.5 0.9 0.6

0.3 50 0.0

−0.3

−0.6

0.6

0.3 100

2 0.0 ^ β

−0.3

−0.6

0.6

0.3 250

0.0

−0.3

−0.6 0.5 0.6 0.7 0.8 0.9 0.5 0.6 0.7 0.8 0.9 0.5 0.6 0.7 0.8 0.9 τ

Figure 2: Bias distribution of βb2 represented as an error plot according to the sample size n ∈ (50, 100, 250), the degree of correlation ρ ∈ (0.1, 0.5, 0.9), the expectiles τ ∈ (0.5, 0.6, 0.7, 0.9) and the error term ε ∼ N (0, 1) in a location-shift scenario. Figures A-B represent the results for the balanced (m = 4) and unbalanced panel (m ∼ U(3, 7)), respectively.

4.3 Results

We presented in this section the performance of the GEEE model in comparison to the GEE ac- cording to the previous defined performance measures. For the sake of space, we could not present all the simulation results in this document. We had chosen to present here the results of

22 the parameter estimate of x2 (which is correlated to the random error in the M3/10 model) for the expectiles τ ∈ {0.5, 0.6, 0.7, 0.8, 0.9} and for a Gaussian random error (location and location- scale-shift). The simulation results of the other parameter estimates, distributions (Student and

Chi-Squared) and expectiles τ ∈ {0.1, 0.2, 0.3, 0.4, 0.5} were postponed in the supplementary material.

Figure 2 and Figure 3 reported the bias distribution in the form of an error plot centered on the mean, in the location-shift (M0) and the location-scale-shift (M3/10) scenarios, respectively. Overall, the bias was centered around 0 with a relatively small range as the sample size increased

in the location-shift and location-scale-shift scenarios. The GEEE model with the AR(1) working

correlation structure— which is the true correlation structure, outperformed the GEEE model

with the other working correlation structure (Exc, Ind). Moreover, the bias distribution of the

GEEE models were similar to the bias distribution of the classical GEE . The bias distribution of

the independent GEEE model displayed a larger range when the degree of correlation was high

(ρ = 0.9), which suggested that a misspecification of the random error’s correlation structure

could introduce some bias. Similar performances were observed when the error was generated

by a Student or a Chi-Squared distribution. These results can be found in the supplementary

material.

To evaluate the asymptotic standard error (SE) of the GEEE parameter estimates, we used the

Monte Carlo standard deviation (SD) as a benchmark and presented the distribution of the ratio

SE SD as an error plot centered at the mean, Figure 4-Figure 5. The results showed that the error plots were centered at 1, which that on average the asymptotic standard error SE and

the Monte Carlo standard deviation SD were identical. However, we observed a larger range

of this ratio at the extreme expectiles, especially when the sample size was small. Furthermore,

this variability was more pronounced for the Exc and Ind correlation structures than the AR(1)

correlation structure, especially when the degree of correlation was high (ρ = 0.9). Similar per-

formances were observed for the Student and Chi-Squared random error. Those results can be

found in the supplementary material.

We reported the results of the relative efficiency (Reff) of the different working correlation struc-

tures in Figure 6 and Figure 7, for the location-shift (M0) and the location-scale-shift scenarios

(M3/10), respectively. The Reff of the AR(1) working correlation structure was equal to 1, because

23 ar1 exc ind

C 0.1 0.5 0.9 0.6

0.3 50 0.0

−0.3

−0.6

0.6

0.3 100

2 0.0 ^ β

−0.3

−0.6

0.6

0.3 250 0.0

−0.3

−0.6 0.5 0.6 0.7 0.8 0.9 0.5 0.6 0.7 0.8 0.9 0.5 0.6 0.7 0.8 0.9 τ

D 0.1 0.5 0.9

0.6

0.3 50 0.0

−0.3

−0.6

0.6

0.3 100

2 0.0 ^ β

−0.3

−0.6

0.6

0.3 250

0.0

−0.3

−0.6 0.5 0.6 0.7 0.8 0.9 0.5 0.6 0.7 0.8 0.9 0.5 0.6 0.7 0.8 0.9 τ

Figure 3: Bias distribution of βb2 represented as an error plot according to the sample size n ∈ (50, 100, 250), the degree of correlation ρ ∈ (0.1, 0.5, 0.9), the expectiles τ ∈ (0.5, 0.6, 0.7, 0.9) and the error term ε ∼ N (0, 1) in a location-scale-shift scenario. Figures C-D represent the results for the balanced (m = 4) and unbalanced panel (m ∼ U(3, 7)), respectively. it was set as the reference model. We observed that the Exc and the Ind working correlation struc- tures always had a relative efficiency greater than 1, (Reff >= 1). That is, the GEEE model with the AR(1) working correlation structure— which is the true correlation structure, was more ef-

ficient than the other working correlation structures. We also noticed that the relative efficiency of the Ind working correlation structure was particularly higher when the degree of correlation was high (ρ = 0.9).

We assessed the goodness of fit of the GEEE model according to the selection criteria previously defined. We presented in Figure 8 and Figure 9 the percentage of selection of each working correlation structure according to the QIC and CIC measures, respectively. We also presented in Figure 10 and Figure 11 the percentage of selection according to the asymmetric measures, asymQIC and asymCIC, respectively. We observed that the QIC criterion selected the AR(1) correlation structure as the true correlation structure most of the time. However, this observation was more pronounced with the CIC selection criterion. Indeed, the percentage of selection of the true correlation structure (AR(1)) was greater than 50% with the CIC measure, in all the

24 ar1 exc ind

E 0.1 0.5 0.9

2

1 50

0

−1

2 100 1

0

−1

2 250 1

0

−1 0.5 0.6 0.7 0.8 0.9 0.5 0.6 0.7 0.8 0.9 0.5 0.6 0.7 0.8 0.9 τ

F 0.1 0.5 0.9

2

1 50

0

2 100 1

0

2 250 1

0

0.5 0.6 0.7 0.8 0.9 0.5 0.6 0.7 0.8 0.9 0.5 0.6 0.7 0.8 0.9 τ

SE Figure 4: Distribution of the ratio SD for βb2 estimator represented as an error plot with respect to the sample size n ∈ (50, 100, 250), the degree of correlation ρ ∈ (0.1, 0.5, 0.9), the expectiles τ ∈ (0.5, 0.6, 0.7, 0.9) and the error term ε ∼ N (0, 1) in a location-shift scenario. Figures E-F represent the results for the balanced (m = 4) and unbalanced panel (m ∼ U(3, 7)), respectively. scenarios, Figure 11.

The asymQIC and the asymCIC measures also selected in the majority of cases the true corre- lation structure (AR(1)). The asymmetric selection criteria performed slightly better than the classical selection criteria, showing that, they could be a good alternative in the presence of heteroscedasticity in the data. Similar results were observed for the Student and Chi-Squared distribution. These results can be found in the supplementary material.

Overall, the simulation results showed that our GEEE model yielded similar statistical proper- ties as the classical GEE model. Indeed, the simulation results showed that our GEEE model was unbiased, consistent and highly efficient even with a misspecification of the true correlation structure (Liang and Zeger, 1986). Furthermore, our GEEE model naturally extended the various

GEE working correlation structures, which offered flexibility in the specification of the random error correlation structure while accounting for the heteroscedasticity in the data. In the next section we highlighted the usefulness of our GEEE model in capturing the heteroscedasticity through the analysis of the labor pain dataset.

25 ar1 exc ind

G 0.1 0.5 0.9

3

2 50 1

0

−1

3

2 100 1

0

−1

3

2 250 1

0

−1

0.5 0.6 0.7 0.8 0.9 0.5 0.6 0.7 0.8 0.9 0.5 0.6 0.7 0.8 0.9 τ

H 0.1 0.5 0.9

2 50 1

0

−1

2 100 1

0

−1

2 250 1

0

−1 0.5 0.6 0.7 0.8 0.9 0.5 0.6 0.7 0.8 0.9 0.5 0.6 0.7 0.8 0.9 τ

SE Figure 5: Distribution of the ratio SD for βb2 estimator represented as an error plot with respect to the sample size n ∈ (50, 100, 250), the degree of correlation ρ ∈ (0.1, 0.5, 0.9), the expectiles τ ∈ (0.5, 0.6, 0.7, 0.9) and the error term ε ∼ N (0, 1) in a location-scale-shift scenario. Figures G-H represent the results for the balanced (m = 4) and unbalanced panel (m ∼ U(3, 7)), respectively.

5 Application

In this section, we apply the GEEE model to estimate the reported labor pain score from the labor pain dataset (Davis, 1991). The labor pain dataset is from a clinical trial that compares two

26 ar1 exc ind

I 0.1 0.5 0.9

2 50

1

0

2 100

Reff 1

0

2 250

1

0

0.5 0.6 0.7 0.8 0.9 0.5 0.6 τ 0.7 0.8 0.9 0.5 0.6 0.7 0.8 0.9

J 0.1 0.5 0.9

2 50

1

0

2 100 Reff 1

0

2 250

1

0

0.5 0.6 0.7 0.8 0.9 0.5 0.6 τ 0.7 0.8 0.9 0.5 0.6 0.7 0.8 0.9

Figure 6: Relative efficiency (Reff) of the working correlation structures (AR(1), Exc, Ind) for βb2 estimator with respect to the sample size n ∈ (50, 100, 250), the degree of correlation ρ ∈ (0.1, 0.5, 0.9), the expectiles τ ∈ (0.5, 0.6, 0.7, 0.9) and the error term ε ∼ N (0, 1) in a location- shift scenario. Figures I-J represent the results for the balanced (m = 4) and unbalanced panel (m ∼ U(3, 7)), respectively. treatments for maternal pain relief during labor. The dataset consists of 83 women in labor with

43 women randomly assigned to the treatment group and 40 women to the placebo group. The response variable (labor pain score) is a self-reported score measured every 30 min on a 100-mm line, where 0 means no pain and 100 means extreme pain. A nearly monotone pattern of is found for the response variable, and the maximum number of measurements per woman is six.

We plotted in Figure 12 the response distribution (boxplot) according to time and treatment. At

first glance, the plots showed a time dependence of the labor pain distribution and a nonlinear trend of the mean as well as the median of the labor pain particularly in the placebo group.The shape of the boxplots and the varying position of the mean/median suggested that the mag- nitude and the sign of the of the labor pain distribution changed over time for the placebo group. We observed that the distance between the second quartile and the median is greater than the distance of the median from the first quartile during the first 90 minutes before becoming smaller in the second half of the time. This suggested that the skewness of the pain response was positive before turning negative after 90 minutes. While the skewness for the pain

27 ar1 exc ind

K 0.1 0.5 0.9

1.0 50

0.5

0.0

1.0 100 Reff 0.5

0.0

1.0 250

0.5

0.0

0.5 0.6 0.7 0.8 0.9 0.5 0.6 τ 0.7 0.8 0.9 0.5 0.6 0.7 0.8 0.9

L 0.1 0.5 0.9 1.5

1.0 50

0.5

0.0 1.5

1.0 100 Reff 0.5

0.0 1.5

1.0 250

0.5

0.0

0.5 0.6 0.7 0.8 0.9 0.5 0.6 τ 0.7 0.8 0.9 0.5 0.6 0.7 0.8 0.9

Figure 7: Relative efficiency (Reff) of the working correlation structures (AR(1), Exc, Ind) for βb2 estimator with respect to the sample size n ∈ (50, 100, 250), the degree of correlation ρ ∈ (0.1, 0.5, 0.9), the expectiles τ ∈ (0.5, 0.6, 0.7, 0.9) and the error term ε ∼ N (0, 1) in a location- scale-shift scenario. Figures K-L represent the results for the balanced (m = 4) and unbalanced panel (m ∼ U(3, 7)), respectively.

ar1 exc ind

M 0.1 0.5 0.9 80 balanced 60 40 20 0

80 unbalanced 60 40 20 0 50 100 250 50 100 250 50 100 250 n

N 0.1 0.5 0.9

60 balanced 40 20 0 unbalanced 60 40 20 0 50 100 250 50 100 250 50 100 250 n

Figure 8: Percentage of working correlation structures selected by the QIC criterion with respect to the sample size n ∈ (50, 100, 250), the degree of correlation ρ ∈ (0.1, 0.5, 0.9), the panel (balanced and unbalanced) and the error term ε ∼ N (0, 1). Figures M-N represent the results for a location-shift and a location-scale-shift scenario, respectively.

28 ar1 exc ind

O 0.1 0.5 0.9 100 balanced 75 50 25 0

100 unbalanced 75 50 25 0 50 100 250 50 100 250 50 100 250 n

P 0.1 0.5 0.9 100 75 balanced 50 25 0 100 unbalanced 75 50 25 0 50 100 250 50 100 250 50 100 250 n

Figure 9: Percentage of working correlation structures selected by the CIC criterion with respect to the sample size n ∈ (50, 100, 250), the degree of correlation ρ ∈ (0.1, 0.5, 0.9), the panel (balanced and unbalanced) and the error term ε ∼ N (0, 1). Figures O-P represent the results for a location-shift and a location-scale-shift scenario, respectively.

ar1 exc ind

Q 0.1 0.5 0.9 50 40 balanced 30 20 10 0

50 unbalanced 40 30 20 10 0 50 100 250 50 100 250 50 100 250 n

R 0.1 0.5 0.9

40 balanced 30 20 10 0

40 unbalanced 30 20 10 0 50 100 250 50 100 250 50 100 250 n

Figure 10: Percentage of working correlation structures selected by the asymmetric criterion asymQIC with respect to the sample size n ∈ (50, 100, 250), the degree of correlation ρ ∈ (0.1, 0.5, 0.9), the panel (balanced and unbalanced) and the error term ε ∼ N (0, 1). Figures Q-R represent the results for a location-shift and a location-scale-shift scenario, respectively. medication group was always positive.

29 ar1 exc ind

S 0.1 0.5 0.9 100 75 balanced 50 25 0 100 unbalanced 75 50 25 0 50 100 250 50 100 250 50 100 250 n

T 0.1 0.5 0.9

75 balanced 50 25 0 unbalanced 75 50 25 0 50 100 250 50 100 250 50 100 250 n

Figure 11: Percentage of working correlation structures selected by the asymmetric criterion asymCIC with respect to the sample size n ∈ (50, 100, 250), the degree of correlation ρ ∈ (0.1, 0.5, 0.9), the panel (balanced and unbalanced) and the error term ε ∼ N (0, 1). Figures S-T represent the results for a location-shift and a location-scale-shift scenario, respectively.

Furthermore, the dataset consists of repeated measurements pertaining to the same subject. In

such an occasion, a longitudinal model is more convenient and the GEE model could be a good

candidate. Indeed, in his paper, Davis (1991) applied the semiparametric GEE model to estimate,

as he called it, the extremely non-normal labor pain response. Unfortunately, the GEE model,

although it models correctly the correlation structure of the random error, is not equipped to

properly model the skewness of the heavy-tailed labor pain distribution. In contrast, the GEEE

model by estimating the conditional expectiles of the labor pain distribution is able to account

for the skewness of the heavy-tailed labor pain distribution. Hence, we apply the GEEE model

to study the medication effect on the expectiles of the self-reported pain score distribution:

µτ(y|X) = Xβτ,

where the parameter βτ depends on τ. Through the estimation of the conditional expectiles, the GEEE model allows a detailed and comprehensive study of the medication effects on the pain score distribution. Additionally, the GEEE model offers a variety of correlation structures to

flexibly model the within-subject dependence. This represents a great advantage of our models with respect to the GEE model, particularly when the response distribution is skewed or heavy-

30 tailed. Notice that the GEEE model makes no assumptions about the distribution of the residual error.

To account for the time dependence of the treatment effect, we included an term in the linear model. Furthermore, as suggested by the plots in Figure 12, we also tested a square and a cubic term for the time variable and their interactions with the treatment. We presented here the result of the linear model including a square term for the time variable and its interaction with the treatment. The corresponding GEEE model is specified as:

2 2 µτ(yit|Ri, Tit) = β0τ + β1τ Ri + β2τ Tit + β3τ RiTit + β4τ Tit + β5τ RiTit, (19)

where Ri is the treatment variable 0 for the placebo group and 1 for the medication group. The variable Tit is the measurement time divided by 30 min. We estimated the GEEE model at 9 expectiles (0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9) and tested three working correlation struc- tures: Ind, Exc, AR(1) . Although the focus is more on the right tail of the labor pain distribution

(women that are more sensitive to pain than average), it is still useful to have a detailed results for a complete and comprehensive analysis. We relied on the CIC and the asymCIC criterion to select among the competing correlation structures.

The different GEEE working correlation structures produced comparable estimates and the CIC and asymCIC measures led to the selection of the GEEE model with the AR(1) working correla- tion structure. The selection of the AR(1) correlation model was consistent with the structure of the correlated data. Indeed, the repeated data was uniformly spaced in time and the correlation of the reported pain was stronger for adjacent measurements than for distant ones.

The result of the estimated parameters and their 95% confidence interval is presented in Fig- ure 13. It is noticeable that the regressor effects vary from one expectile to another, and from one region to another of the response distribution. For example, the of the estimated coefficient βb4 differs depending on whether one is in the center (null effect), on the left tail (posi- tive effect), or on the right tail (negative effect) of the response distribution. This shows that the

GEEE model, through the estimation of the conditional expectiles, offers a complete overview of the heterogeneous regressor effects on the response distribution, while correctly specifying the correlation structure of the random noise vector for the repeated measurements (AR(1) work- ing correlation structure). Whereas, the classic GEE model summarizes the complex relationship between the response and the regressors by estimating their effects on the mean of the response

31 variable.

We observe that the effect size of the intercept estimate (βb0) is significantly increasing across τ.

Whereas, the effect size of the parameter estimate βb1 is rapidly decreasing and is not statistically significant at (5%) level, suggesting that the baseline pain does not globally differ between the

placebo and the treatment group.

The evolution of the difference between the placebo and the medication effects over time can be

formalized as:

2 µτ(yit|Ri = 1) − µτ(yit|Ri = 0) = β1τ + β3τ Tit + β5τ Tit. (20)

The effect size of the parameter estimate βb3 is negative and statistically significant at 5% level.

The effect size of the parameter estimate βb5 is subject to a sign shift and is significant at the extreme right tail of the distribution. The effect size of the parameter estimate βb5 has negative values and close to zero at the lower expectiles and at the expectiles close to the center (τ = 0.5)

and then its values turn positive at the higher expectile where it seems to be significant at the 5%

level. We observe that the effects of the three parameter estimates are simultaneously significant

at the higher expectiles (τ = 0.8, 0.9), of the pain score distribution, Figure 13. In other words,

the difference in pain between the two groups (placebo/treatment) is significant at the 5% level

for the fraction of women that is more sensitive to pain, which may suggest that the medication

is effective. That said, these results should be interpreted with caution because the estimation of

the tails of the distribution is more subject to sample size.

The combined effect of the predictors in the placebo group and the treatment group is shown in

Figure 14. The results show that the curves of the conditional expectile functions capture well the

skewness displayed by the boxplots of the labor pain variable. Whereas, the estimated curve of

the GEE model alone does not account for these properties of the labor pain distribution. Indeed,

as shown in Figure 14, the estimated curve of the GEE model, GEEE(τ = 0.5), alone is ineffective

to capture the heterogeneity displayed by the data.

Interestingly, we observe two trajectories in the placebo group: the women that are more sensitive

to pain than average (τ = 0.5) and the portion of women that is less sensitive to pain than

average, Figure 14. The women that are more sensitive to pain report an increasing pain and at

a higher rate than average. While the women that are less sensitive to pain report an increasing

32 pain, but at a lower rate. This difference may suggest the presence of a placebo effect at the lower

tail of the response distribution. On the other hand, in the medication group, we observe that

the labor pain score starts at a low level and increases slowly compared to the placebo group,

suggesting that the medication is effective, Figure 14. Moreover, we observe that the curvature

of the pain curves in the treatment group is similar to that of the curves of women who report less

pain than average in the placebo group. Another result in favor of the efficiency of the medication

treatment derived from the GEEE model. Note that these results should be interpreted with

caution because not all parameter estimates at all levels of the conditional pain distribution are

significant at the 5% level. Additionally, a large sample size is necessary to study the tails of the

response distribution and ensure the stability of the estimates.

In summary, we applied our GEEE model and the classical GEE to study the effectiveness of the

pain treatment for women in labor. The application of the GEE model alone did not allow us

to conclude on the effectiveness of the labor pain medication. Whereas, the application of the

GEEE model indicated that there is a pain difference between the proportions of women that

is more sensitive to pain (the extreme right of the tail of the distribution) in the placebo group

compared to the treatment group, suggesting that the medication was effective for this segment

of the population. Despite the limitations of the sample (self measured pain, sample size, missing

data), the data analysis highlighted the main advantage of the GEEE over the classical GEE,

which is the description of the conditional response distribution through the estimation of its

expectiles, while correctly specifying the correlation structure of the random noise vector for the

repeated measurements.

6 Conclusion

We combined the weighted asymmetric least squares regression (ER) and the generalized esti- mating equations (GEE) to derive a new class of estimators, which we call the GEEE model. The

GEEE estimates the conditional expectiles of the response distribution, which make it possible

33 100 100

80 80

60 60

40 40 Pain(Group Placebo) Pain(Group Pain(Group Traitment) Pain(Group

20 20

0 0

30 60 90 120 150 180 30 60 90 120 150 180 Time Time

Figure 12: Boxplot of the labor pain score for the placebo and the pain medication groups. The solid lines connect the and the dashed lines connect the means.

Ar1 Exc Ind

0 60 −10

0 40 1 ^ ^ β β −20

20 −30

0 −40 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

20 0

15 −10 2 3 ^ ^ β 10 β −20 5

0 −30

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

2 4 1

4 5 2 ^ ^ β 0 β

−1 0

−2 −2

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 τ τ

Figure 13: Parameter estimates and 95% confidence interval of the GEEE independent, exchange- able and AR(1) correlation models at 9 expectiles, τ ∈ (0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9). Note that the classical GEE corresponds to the GEEE model with τ = 0.5. to capture regressor effects in the location, scale, and shape of the response distribution. This approach is useful in the presence of heteroscedasticity or when the response variable is asym- metric, skewed, or heavy-tailed. Additionally, the GEEE model offers a variety of correlation

34 10% 30% 50% 70% 90%

20% 40% 60% 80%

100 100

80 80

60 60

40 40 Pain(Group Placebo) Pain(Group Pain(Group Traitment) Pain(Group

20 20

0 0

30 60 90 120 150 180 30 60 90 120 150 180 Time Time

Figure 14: Boxplot of the labor pain score and the estimated curves of the expectile functions (τ ∈ (0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9)) for the placebo group and the pain medication group using the GEEE and AR(1) correlation model. Note that the classical GEE corresponds to the GEEE model with τ = 0.5. structures to flexibly model dependence among the observations, without making any assump- tion about the random error distribution.

In this paper, we derived the asymptotic properties of the GEEE estimator and proposed a het- erogeneous, consistent, and robust estimator of its variance-covariance matrix. The GEEE model is computationally efficient and easy to implement. See our GitHub for a free R package that simplifies the implementation (github.com/AmBarry/expectgee). We evaluated the finite sam- ple properties of the GEEE model and the simulation results displayed its favorable qualities under various scenarios. We applied the GEEE model to the longitudinal clinical trial to eval- uate the performance of two treatments (placebo/medication) with a skewed and heavy-tailed response variable. We showed that by estimating different expectiles of the response distribu- tion, the GEEE model provides more information than the GEE model (about the relationship between the regressor and the response variables).

Our GEEE model enhances the classical GEE . It allows the GEE model to account for the presence of heteroscedasticity in the data and capture regressor effects in the location, scale, and shape of the response distribution. Since GEE is a popular and useful statistical model for analyzing correlated data, we believe our GEEE model will be useful to researchers interested in analyzing

35 longitudinal data.

In terms of limitations: like the GEE model, the GEEE model is sensitive to outliers. We recom- mend using the numerous outlier detection tools available to reduce their influence prior to data analysis (Chatterjee and Hadi, 1986). In future work, it might be worthwhile to extend the GEEE model to account for missing data in longitudinal data analysis. Other complex longitudinal models that could benefit from the expectile regression model include linear mixed, logistic GEE, and count GEE.

36 References

Aigner, D., Amemiya, T., and Poirier, D. (1976). On the estimation of production frontiers: Maxi-

mum likelihood estimation of the parameters of a discontinuous density function. International

Economic Review, 17(2):377.

Barry, A., Bhagwat, N., Misic, B., Poline, J.-B., and Greenwood, C. M. T. (2020). Asymmetric

influence measure for high dimensional regression. Communications in Statistics - Theory and

Methods, pages 1–27.

Burrus, C., Barreto, J., and Selesnick, I. (1994). Iterative reweighted least-squares design of FIR

filters. IEEE Transactions on Signal Processing, 42(11):2926–2936.

Chatterjee, S. and Hadi, A. S. (1986). Influential Observations, High Leverage Points, and Outliers

in Linear Regression. Statistical Science, 1(3):379–393.

Chen, L., Wei, L.-J., and Parzen, M. I. (2004). Quantile Regression for Correlated Observations, pages

51–69. Springer New York, New York, NY.

Davis, C. S. (1991). Semi-parametric and non-parametric methods for the analysis of repeated

measurements with applications to clinical trials. Statistics in Medicine, 10(12):1959–1980.

Efron, B. (1991). Regression using asymmetric squared error loss. Statistica Sinica,

1(1):93–125.

Farcomeni, A. and Marino, M. F. (2015). Linear quantile regression models for longitudinal ex-

periments: an overview. METRON, 73(2):229–247.

Farooq, M. and Steinwart, I. (2017). An svm-like approach for expectile regression. Computational

Statistics and Data Analysis, 109:159–181.

Halekoh, U., Højsgaard, S., and Yan, J. (2006). The r package geepack for generalized estimating

equations. Journal of Statistical Software, 15/2:1–11.

Hardin, J. W. and Hilbe, J. M. (2003). Generalized estimating equations. Chapman & Hall/CRC,

Boca Raton, Flor., 2 edition. An optional note.

Hin, L.-Y. and Wang, Y.-G. (2009). Working-correlation-structure identification in generalized

estimating equations. Statistics in Medicine, 28(4).

Højsgaard, S., Halekoh, U., and Yan, J. (2005). The R Package geepack for Generalized Estimating

Equations. Journal of Statistical Software, 15(1):1–11.

37 Jiang, C., Jiang, M., Xu, Q., and Huang, X. (2017). Expectile regression neural network model

with applications. Neurocomputing, 247:73–86.

Jiang, J., Luan, Y., and Wang, Y.-G. (2007). Iterative estimating equations: Linear convergence

and asymptotic properties. The Annals of Statistics, 35(5):2233–2260.

Karim, R. and Zeger, S. (1989). A sas macro for longitudinal data analysis.

Kim, M. and Lee, S. (2016). Nonlinear expectile regression with application to value-at-risk and

expected shortfall estimation. Computational Statistics and Data Analysis, 94:1–19.

Kocherginsky, M., He, X., and Mu, Y. (2005). Practical Confidence Intervals for Regression Quan-

tiles. Journal of Computational and Graphical Statistics, 14(1):41–55.

Koenker, R. (2004). Quantile regression for longitudinal data. Journal of Multivariate Analysis,

91(1):74–89.

Koenker, R. and Bassett, Jr., G. (1978). Regression quantiles. Econometrica. Journal of the Econometric

Society, 46(1):33–50.

Kojadinovic, I. and Yan, J. (2010). Modeling Multivariate Distributions with Continuous Margins

Using the copula R Package. Journal of Statistical Software, 34(i09).

Leng, C. and Zhang, W. (2014). Smoothing combined estimating equations in quantile regression

for longitudinal data. Statistics and Computing, 24(1):123–136.

Liang, K.-Y. and Zeger, S. L. (1986). Longitudinal data analysis using generalized linear models.

Biometrika, 73(1):13–22.

Liao, L., Park, C., and Choi, H. (2019). Penalized expectile regression: an alternative to penalized

quantile regression. Annals of the Institute of Statistical Mathematics, 71(2):409–438.

Lin, J., Tong, X., Li, C., and Lu, Q. (2020). Expectile Neural Networks for Genetic Data Analysis

of Complex Diseases. arXiv:2010.13898 [stat]. arXiv: 2010.13898.

Liu, Y. and Wu, Y. (2011). Simultaneous multiple non-crossing quantile regression estimation

using kernel constraints. Journal of , 23(2):415–437.

Majumdar, A. and Paul, D. (2016). Zero expectile processes and bayesian spatial regression.

Journal of Computational and Graphical Statistics, 25(3):727–747.

Nelder, J. A. and Wedderburn, R. W. M. (1972). Generalized linear models. Journal of the Royal

Statistical Society. Series A (General), 135(3):370–384.

38 Newey, W. K. and Powell, J. L. (1987). Asymmetric least squares estimation and testing. Econo-

metrica, 55(4):819–47.

Pan, W. (2001). Akaike’s information criterion in generalized estimating equations. Biometrics,

57(1):120–125.

Pan, Y. (2020). Distributed optimization and statistical learning for large-scale penalized expectile

regression. Journal of the Korean Statistical Society.

Pan, Y., Liu, Z., and Cai, W. (2020). Large-Scale Expectile Regression With Covariates Missing at

Random. IEEE Access, 8:36502–36513. Conference Name: IEEE Access.

R Core Team (2018). R: A Language and Environment for Statistical Computing. R Foundation for

Statistical Computing, Vienna, Austria.

Righi, M., Yang, Y., and Ceretta, P. (2014). Nonparametric expectile regression for conditional

autoregressive expected shortfall estimation. Contemporary Studies in Economic and Financial

Analysis, 96:83–95.

SAS Institute Inc. (2003). SAS/STAT Software, Version 14.1 User’s Guide. Cary, NC.

Schulze Waltrup, L. and Kauermann, G. (2015). Smooth expectiles for panel data using penalized

splines. Statistics and Computing, 27(1):271–282.

Waldmann, E., Sobotka, F., and Kneib, T. (2016). Bayesian regularisation in geoadditive expectile

regression. Statistics and Computing, pages 1–15.

Wedderburn, R. W. M. (1974). Quasi-likelihood functions, generalized linear models, and the

gauss-newton method. Biometrika, 61(3):439–447.

White, H. (1980). A heteroskedasticity-consistent covariance matrix estimator and a direct test

for heteroskedasticity. Econometrica, 48(4):817–38.

Xing, J.-J. and Qian, X.-Y. (2017). Bayesian expectile regression with asymmetric normal distribu-

tion. Communications in Statistics - Theory and Methods, 46(9):4545–4555.

Xu, Q., Liu, X., Jiang, C., and Yu, K. (2016). Nonparametric conditional autoregressive expectile

model via neural network with applications to estimating financial risk. Applied Stochastic

Models in Business and Industry, 32(6):882–908.

Xu, Q. F., Ding, X. H., Jiang, C. X., Yu, K. M., and Shi, L. (2020). An elastic-net penalized expectile

regression with applications. Journal of Applied Statistics, 0(0):1–26. Publisher: Taylor & Francis

_eprint: https://doi.org/10.1080/02664763.2020.1787355.

39 Yan, J. (2002). geepack: Yet another package for generalized estimating equations. R-News,

2/3:12–14.

Yan, J. and Fine, J. P. (2004). Estimating equations for association structures. Statistics in Medicine,

23:859–880.

Yang, Y. and Zou, H. (2015). Nonparametric multiple expectile regression via er-boost. Journal of

Statistical Computation and Simulation, 85(7):1442–1458.

Yin, G. and Cai, J. (2005). Quantile Regression Models with Multivariate Failure Time Data.

Biometrics, 61(1):151–161. _eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1111/j.0006-

341X.2005.030815.x.

Zhao, J., Chen, Y., and Zhang, Y. (2018). Expectile regression for analyzing heteroscedasticity in

high dimension. Statistics & Probability Letters, 137:304–311.

Zhao, J., Yan, G., and Zhang, Y. (2019). Robust Estimation and Shrinkage in Ultrahigh Dimen-

sional Expectile Regression with Heavy Tails and Variance Heterogeneity. arXiv:1909.09302

[math, stat]. arXiv: 1909.09302.

40