American Journal of Mathematics and 2021, 11(1): 18-25 DOI: 10.5923/j.ajms.20211101.03

The Maximum Likelihood Estimates with Wrong Model and Covariate Assumptions are Violated

Nuri H. Salem Badi*, Mohamed M. Shakandli

Faculty of Science, Statistical Department, University of Benghazi, Benghazi, Libya

Abstract The general method of estimation the logistic regression parameter is maximum likelihood (ML). In a very general sense the ML method yields values for the unknown parameters that maximize the probability of the observed set of data. Much work discusses the behaviour of the distribution of Maximum likelihood estimates (MLE) for the logistic regression model under the correct model. However, many issues still need more examination as the relationship between the links of the logistic function and the skew- which consider in this work. In this paper, our work considers this behaviour and investigates the MLE method under the logistic regression model when the wrong model has been fitted and the assumption on covariates are violated. We will consider this behaivuor and the covariates drawing from a non-normal distribution and evaluate it by simulation. Keywords Logistic regression model, Maximum likelihood method, Skew-normal distribution, Probit function and expit function

extensively by [10]. The logistic model when 1. Introduction Ymi~ binomial( i , i ) with mi 1 can be fitted using the method of maximum likelihood to estimate the parameters. The subject of the behaviour of maximum likelihood The first step is to construct the likelihood function which is estimation (MLE) method in logistic regression model has a function of the unknown parameters and data, then choose attracted the attention of many scientists and researchers. [6] those values of the parameters that maximize this function. developed the analysis of the binary data and application of The log-likelihood function is: the maximum likelihood: see also [7]. [11] introduced the n and used special techniques to l( ; Y ) y log(  )  (1  y )log(1   ) obtain the maximum likelihood estimates of the parameters,  i i i i i1 with observations distributed according to some exponential family. [10] discussed the generalized linear model and Where, in this case we have behaviour of the maximum liklihood (ML) method for nn l( ) y (   xTT  )  log 1  exp(   x  ) binary outcome. [5] discused method used a modified score i i i function to reduce the bias of the maximum likelihood ii11 estimates. The ML method under the wrong logistic model where  is a p -dimensional vector, x are a vector of has been discussed extensively by [9, p.23]. The idea is to try i th to find in terms of the true parameters of the model the least covariates for i individual and in1, , . To estimate false values which are obtained by maximising the incorrect the parameters  and  j we differentiate the likelihood function. We will use the relationship between log-likelihood function with respect to  and  . If the expit(u ) euu / (1 e ) function and probit function (), j fitted model is the true model then, the asymptotic and use the properties of the multivariate skew-Normal ˆ 1 distribution to compute a good approximation to the least distribution of  j is ~NI (  , (  ) ) where I() is false values under wrong logistic model. The behaviour th of MLE for binary outcome has been discussed more the ()pp Fisher's information matrix, its (,)rs element is defined as * Corresponding author: 2 [email protected] (Nuri H. Salem Badi)  l IE. Received: Dec. 23, 2020; Accepted: Jan. 12, 2021; Published: Jan. 30, 2021 rs  Published online at http://journal.sapub.org/ajms rs

American Journal of Mathematics and Statistics 2021, 11(1): 18-25 19

If is evaluated at MLE ˆ . The EYX()∣ is Pr(YX∣ 1 ) and this is given by the true model TT 2. MLE Under the Wrong Model Pr(YXXX∣ 1 ) expit (  f f   a a ). [9] discussed how the maximum likelihood method used But we fit the model without Xa . The least false values, * * to estimate the parameters of a given regression model is  and  f , can be found in terms of , f and a and affected when the assumed model is incorrect. If the data are the parameters of the distribution of the covariates as from independent and identically distributed, the log likelihood **TT function in case of the density fy(,)i  for an individual EXEXexpit()()     expit     (3) ff    observation, we can write as: n EXXEXXexpit()().** TT    expit     (4) fj f f   fj  ni( )  logfy ( , ). i1 th where, X fj is the j element of X f (jp 1, , ) . The important question here is, if we fit a model for Y as These equations can be solved approximately if X follows fy()∣ when true model is gy(), what value do we a multivariate normal distribution, by approximating estimate for  ? We have for each value of  , by the weak expit() and using the skew-normal distribution. In fact, the large numbers, in probability, as n  previous work considers by [14], discussed the inconsistent 1 treatment estimates from mis-specified logistic regression nn ( ) E log f ( Y∣ ) , analyses of randomized trials. In this paper, we are interested The Kullback-Leibler ()KL divergence is in more investigate the behaviour of MLE under the wrong mode when covariate assumptions are violatedl and gy() KL((),(,)) g y f y   g ()log y dy . (1) assessment by simoulation. fy(,) Skew-Normal Distribution For model f(,) y∣ x  , we need to solve likelihood The skew-Normal distribution has been discussed by [3] and [4]. More discussion and numerical evidence of the function and find the least false parameter  * which presence of in real data by [17] and [2]. Other minimises KL(,) g f , then discussion for quadratic forms and flexible class of skew-symmetric distribution discussed by [12] and [8]  logf ( Y∣ X , ) EE 0. (2) also, by [16]. A random variable U is called skew normal Xg *  with parameter  , so U~ SN ( ) , if its density function is : Application to Logistic Regression f( u ; ) 2  ( u ) (  u ) (5) Now, we will apply the MLE method under wrong model where uR , () and () are the density and on logistic regression model. The idea is to use this method * distribution function of standard normal distribution to obtain equations which determine the least false value  respectively, that defined by [3]. In general case, [1], for a logistic regression. To explain the behaviour of the discussed extends the skew normal distribution and MLE in this case we will partition of the vector covariates properties of this family. We can defined the extend X , as previous (,)XXfa. The fitted model is multivariate skew-normal distribution as; a p -dimensional random variable U has extended skew-normal distribution, expit(   ffX ) ESN(,,,)   , if it has density: However, this model is wrong because the true model is. T p (;,)(())uu        expit()   fXX f   a a ( / 1  T   ) So, expectation of the ML equations are zero when *** where  is a scalar,  is dispersion matrix has pp  (,)   . From the above equations where Y is binary, the expectation in this case becomes dimensional and parameters  and  are p -dimensional. ** The p (.; , ) is the density of a p -dimensional normal EEYEXX(YX∣ ( )) X (expit ( f f )), variable with mean  and dispersion matrix  where and (.) is the cumulative distribution function of a univariate ** standarad normal variable. EXEYXEXXX( f (∣ )) X ( fexpit ( f f )). Probit Function and expit Function

20 Nuri H. Salem Badi and Mohamed M. Shakandli: The Maximum Likelihood Estimates with Wrong Logistic Regression Model and Covariate Assumptions are Violated

We are consider the approximation of expit() by (), which is the distribution function of a standard normal variable. [15] * reported that, the logistic distribution closely resembles the  . (7) 2 *TT * 2 normal distribution which discussed the shape of distribution 11kkf  ff  f     both are symmetrical and noted some properties. [13, p.5], point out the comparison of logistic and normal cumulative Turning our attention to (4) and using the results for the distribution function. The approximation form defined as: expectation of a SN distribution, we obtain extpi ()u( ku), where k  (16 3) / (15 ) . * *  ff f    2 *TT * 2 * * 11kkf  ff  f   f  ff  f 3. Least False Values Under Wrong  (8) ()  Logistic Model  1  , 22TT The main point is, suppose that we model a binary 11kk       outcome, Y , using a logistic regression, i.e. where () is the standard Normal density, and T Pr(YXX 1|f ) expit (  f f ), () 1 denotes the first p elements of  , which is but that the true model includes more covariates, i.e. ff f   fa a . Using the result in (7), we can simplify (8) TT and finally, we can follows that Pr(YXXX 1| ) expit (  f f   a a ). *11  Now, to find the least false values in terms of parameters f()  f   ff  fa  a (9) 2 T of the true logistic model, use the approximation form 1k aa expit()()    , properties of the skew-normal distribution and and we use the two equations which determine the MLEs, as we have discussed in section 2 about MLE under the *   f  . (10) wrong model to find the least false values. Let us assume that 2 T 1k aa X has ()pq -dimensional multivariate Normal distribution, where      1  . From this we get Note that where p and q denote the dimensions of X f and Xa aa af ff fa * respectively. The presence of an intercept in the above (9) includes a denominator, such that ff even when models means that we may assume, wlog, that EX( ) 0 . If 0 , although, of course, *  if   0 . var()X , then also suppose that the partition of this fa ff a matrix corresponding to X f and Xa is: 4. Simulation Study when the Covariates ff fa ,  Follow Normal Distribution af aa then we can apply the approximation to (3) and (4) using The goal of this simulation, is to assess the approximation computed for the least false values for logistic regression extpi ()u( ku), which this leads to model. We are interested to application on case of the covariates generated by multivariate Normal distribution. E k[][]**  TT X  E  k    X (6) X  f f X    Applied on different cases with different variance and different correlation to check on the behaviour of the Now we use the properties of skew-normal distribution, in formulae of the least false values under wrong model. We this case the density function of skew-normal distribution looking in this simulation for check the approximation of the where EX( ) 0 is last false values for a true logistic regression model has five (k ( T X ))  ( X ) covariates p  5 is fX(,,).   T k i expit()   X  1k2T T  where, (,,,) 1  2  5 , X(,,) xii15 x and in the Then we can write the (6) as fitted model there are two covariates. We designed the simulation as follows:  *  We choose X as a draw from the multivariate normal kk    , distribution XN~ (0, ) . 11kk2 *TT   *  2    5 f ff f  We consider the 5 5 covariance matrix  is

American Journal of Mathematics and Statistics 2021, 11(1): 18-25 21

2 11 12  The sample size has been used n  500 , n 10000  , and N 1000 number of simulation. 21 22 where, 4.1. Results and Discussion

31 32 We report the accuracy of the estimation parameters of the 1 12  wrong logistic regression model has two covariates when the 11 ,,  21   41 42 21 1 true model has five covariates. Tables shows comparison 51 52 between the least false values which is computed by 1 34 35 approximation of extpi ()u( ku) and skew-Normal T distribution properties and values of estimated parameters by 22  431 45 ,  21   12 . fitted logistic regression model. RRR,, denote the ratios 53 54 1 1 2 3 of the mean of the simulated fits to the comuted last false  Use two different variance  2  0.1,1.5 . value.  We consider 3 different cases of correlation which is Table 1 and Table 2, shows the results of simulation of data generated by multivariate Normal distribution in cases each case of  has same  designed as: ij ij of Pr(Y  1) 60% and Pr(Y  1) 10% respectively (0.2,0.2,0.4), (0.8,0.7,0.9), (0.2,-0.2,-0.2). Values are with sample size n  500 . Table 3 and Table 4, shows the chosen to assume  is positive definite. results of simulation with sample size n 10000 . We can  We choose the parameters ,, and  to give 15 see clearly the results show ratios close to one. The same us two cases Pr(Y  1) 10% and Pr(Y  1) 60% . behaviour results found in both cases of Pr(Y  60%) and As calculate the unconditional Pr(Y  1) by properties Pr(Y  10%) , where is the ratio found close to one. That is skew-normal distribution we get, meaning the approximation form of the least false values works well, although the probability of outcome Y is very k Pr(Y  1)   . low about 10% , but a good results and reasonable behaviour 2 T 1k  have been found. Some issues of low ratio a raised in case of sample size n  500 , that there are some estimated values Choose were very small close to zero which affect on ratio. 10.25,  2  0.35,  3  0.40,  4  0.3,  5  0.2 and Moreover, the parameter selection and correlation selection adjust  , so that over the covariates Pr(Y  1) 10% may be having slightly affected in a few cases. (  2.2) and Pr(Y  1) 60% (  0.4) .

Table 1. Simulation results of last false values by multivariate Normal distribution in case Pr(Y  1) 60% , n  500 and Ri denote to the Ratio

Table 2. Simulation results of last false values by multivariate Normal distribution in case Pr(Y  1) 10% , n  500 and Ri denote to the Ratio

22 Nuri H. Salem Badi and Mohamed M. Shakandli: The Maximum Likelihood Estimates with Wrong Logistic Regression Model and Covariate Assumptions are Violated

Table 3. Simulation results of last false values by multivariate Normal distribution in case Pr(Y  1) 60% , n 10000 and Ri denote to the Ratio

Table 4. Simulation results of last false values by multivariate Normal distribution in case Pr(Y  1) 10% , n 10000 and Ri denote to the Ratio

5. Covariate Assumptions are Violated - Z~ MVN (0, R ) where R is the correlation matrix. The previous analysis discussed the least false values - UZ() [0,1], (element wise) and with multivariate covariates, the simulation providing us 1 11 XU~ 5 ( ) [ 2 ,2 ] . reasonable results. That was applied to covariates draw U 2 22 from multivariate normal distribution. In this part we are  We consider the 5 5 covariance matrix  is interested to consider the model with symmetric distribution 11 12 different from multivariate normal distribution. As we know, . the behaviour of the MLE maybe affected by the assumption 21 22 of normality on the covariates. So we will consider two of As we know, the mean of the uniform distribution is symmetric multivariate distribution, say, t -distribution and U 1/ 2 and the variance is vra (U )1/1 2 . So, in this multivariate uniform distribution. case we have var()XU  25/12 and cov()XXUi, Uj  25 5.1. Simulation of Multivariate t and Multivariate cov()UUij, , where the covariance is Uniform Distribution ij The goal of this simulation is to use the same computed arcsin( ) formulae of the last false value which used in the previous 2 cov(UUij,). analysis, to assess the approximation computed for the 2 least false values for logistic regression model and with Then, the components of covariance matrix  are multivariate t and uniform distribution. We use the same assumption which used in previous simulation, let consider 1 cov(,)UU12 we have a true logistic regression model which has five 12 11 25 , covariates is 1 p  5 cov(,)UU 21 T 12 i expit()   X 1 cov(,)(,)UUUU cov  We choose X as a draw from one of two multivariate 12 3 4 3 5 distribution; either  1 - Multivariate Uniform distribution, or Multivariate t 25 cov (UUUU , ) cov ( , ) , 22 4 312 5 4 -distribution.  1  We are generating multivariate Uniform covariates by cov(,)(,)UUUU5 3 cov 5 4 related with standard Normal distribution as: 12

American Journal of Mathematics and Statistics 2021, 11(1): 18-25 23

values in this case have the same behaviour of the cov(,)(,)UUUU3 1 cov 3 2 multivariate normal covariates. The results of the second part  25 cov (UUUU , )(,),. cov T   21 4 1 4 2 21 12 of this simulation, concerning for results of data generated by  cov(,UUUU)(,) cov 5 1 5 2 multivariate t -distribution which showed in Table 7 and  We generating multivariate t-distribution with various Table 8 in cases of Pr(Y  1) 60% and Pr(Y  1) 10% value of degrees of freedom df , which changes the shape of respectively with sample size n  500 . Table 9 and Table the distribution, we choose two cases df  (5,200) and use 10 shows the results in case of sample size n 10000 . The results of four cases with different degree of freedom 2 two different variance   0.1,1.5 in each case. df  200,5 and one case of variance has been used  Use the same assumption on which used in the priviouse  2  0.5 . Comparing these results with case of Normal simulation: correlation and variance, also use distribution, more clearly when the degree of freedom larger Pr(Y  1) 10% and Pr(Y  1) 60% with sample size enough we can reported that the results have the same n  500 , n 10000 and N 1000 number of simulation. behaviour. Moreover, we can say that the ratio appeared nearly close to one in all cases of correlation and degree of 5.2. Results and Discussion freedom, some slightly differences with low ratio appeared The results concerning two simulation data generated by in few cases when degree of freedom is df  5 and multivariate Uniform distribution and multivariate t n  500 , which have the same behaviour found in case of the -distribution. The results of this simulation with Uniform normal multivariate covariates when the estimated value was distribution, showed in Table 5 and Table 6, in cases of very small. Pr(Y  1) 60% and Pr(Y  1) 10% respectively with Overall, if we assume normality on covariates, but the two sample size n  500 , n 10000 . The same results covariates are drawn from a multivariate t -distribution with appeared, the ratio found nearly close to one in almost cases. variety of degree of freedom and multivariate Uniform A few cases appeared low ratio in case of sample size distribution, which use large sample size n 10000 . We n  500 , which there are some estimated value were very found that, for different combination of correlations and small (i.e, when 11 0.2,  12   0.2,  22   0.2 the variances, are appeared the results from (9) and (10) still * appear to hold. parameter estimated was 110.0756, 0.0998 and the ratio was R2  0.75 ). In general we found the least false

Table 5. Simulation results of last false values using different values of ij by generated variables from multivariate Uniform distribution in case

Pr(Y  1) 60% , n  500 , n 10000 and Ri denote to the Ratio

Table 6. Simulation results of last false values using different values of ij by generated variables from multivariate Uniform distribution in case

Pr(Y  1) 10% , n  500 , n 10000 and Ri denote to the Ratio

24 Nuri H. Salem Badi and Mohamed M. Shakandli: The Maximum Likelihood Estimates with Wrong Logistic Regression Model and Covariate Assumptions are Violated

Table 7. Simulation results of last false values by multivariate t-distribution in case Pr(Y  1) 60% , n  500 and Ri denote to the Ratio

Table 8. Simulation results of last false values by multivariate t-distribution in case Pr(Y  1) 10% , n  500 and Ri denote to the Ratio

Table 9. Simulation results of last false values by multivariate t-distribution in case Pr(Y  1) 60% , n 10000 and Ri denote to the Ratio

Table 10. Simulation results of last false values by multivariate t-distribution in case Pr(Y  1) 10% , n 10000 and Ri denote to the Ratio

6. Conclusions multivariate normal. The results appeared the MLE has reasonable behaviour with the least false values in case of The goal of this paper considers to investigate the wrong model, which computed in terms of the true behaviour of the MLE and find a formula to compute parameters. On the other hands, we have applied the results the least false values when the wrong logistic model has defined in (9) and (10), which assumed covariates were been fitted. Moreover, we examined the behaviour of the multivariate normal, when the covariates do not follow this model when the assumption on covariates are violated. distribution. Again the results derived in (9) and (10) gave Corresponding to the simulation analysis, we found a good accurate answers. We consider five dimensional multivariate results in all cases when the covariates are drawn from the uniform and t-variables when only two covariates were fitted.

American Journal of Mathematics and Statistics 2021, 11(1): 18-25 25

In fact, we can see clearly that, both the low probability [6] Cox, D. R. (1970). Analysis of Binary data. Chapman and of outcome Y and the sample size have affected on the Hall, London. estimated value of parameters. In the case of the large sample [7] Cox, D. R. and Snell, E. J. (1989). Analysis of Binary data. size, the standard error will be close to zero and the ratio Chapman and Hall, New York. close to one. In contrast, the standard error will be increases [8] Chiogna, M. (2004). Ma, Y. and Genton, M. G. and the ratio will be appear faraway from one in some cases Scandinavian Journal of Statistics, 31, 459-468. of small sample size. The results showed that for these symmetric non-normal variables, the violation of the [9] Claeskens, G. and Hjort, N. L. (2008). Model Selection and assumption of normality made little difference. Some Model Averaging. Cambridge University Press. discrepant were noticed when the value of coefficients were [10] McCullagh, P. and Nelder, J. A. (1989). Generalized Linear very small close to zero. Models. Chapman and Hall, London, 2nd. [11] Nelder, J. A. and Wedderburn, R. W. M. (1972). Generalized Linear Models. Journal of the Royal Statistical Society, series A, 135, 379-384. REFERENCES [12] Loperfido, N. (2001). Quadratic Forms of Skew-normal Random Vectors. Statistics and Probability Letters, 54, [1] Arnold, B.C. and Beaver, R.J. (2000). Hidden Trunction 381-387. Models. The Indian Journal of Statistics, 62, 23-35. [13] Johnson, N. L. and Kotz, S. (1970). Continuous Univariate [2] Azzalini, A. (2005). The Skew-normal Distribution and Distributions. Wiley- Interscience Publication, U.S.A. Related Multivariate Families. Journal of Statistics, 32, 159-188. [14] Matthews, J. N. S. and Badi, N. H. (2015). Inconsistent treatment estimates from mis-specified logistic regression [3] Azzalini, A. (1985). A Class of Distribution Which Includes analyses of randomized trials. Statistics in Medicine, 34, the Normal Ones. Scan- dinavian Journal of Statistics, 12, 2681-2694. 171-178. [15] Gumbel, E. J. (1961). Bivariate logistic distributions. Journal [4] Azzalini, A. (1986). Further Results on A Class of of the American Statistical Association, 56, 335-349. Distribution Which includes the Normal Ones. Statistica, 46, 199-208. [16] Hill, Henze, N. (1986). A Probabilistic Representation of the Skew-normal Distribution. Scandinavian Journal of Statistics, [5] Badi, N.H.S. (2017). Properties of the Maximum Likelihood 13, 271-275. Estimates and Bias Reduction for Logistic Regression Model. Open Access Library Journal, 4, e3625. [17] Hill, M. A. and Dixon, W. J. (1982). Robustness in Real Life. Biometrics, 38, 377-396.

Copyright © 2021 The Author(s). Published by Scientific & Academic Publishing This work is licensed under the Creative Commons Attribution International License (CC BY). http://creativecommons.org/licenses/by/4.0/