The Use of Logit and Probit Regression Models in the Process of Graduates' Employment

INTERNATIONAL JOURNAL OF SCIENTIFIC & TECHNOLOGY RESEARCH VOLUME 8, ISSUE 11, NOVEMBER 2019 ISSN 2277-8616 The Use Of Logit And Probit Regression Models In The Process Of Graduates’ Employment Aleksey V. BURKOV, Elena A. Murzina Abstract: The paper analyzes a variety of conventional estimation methods of binary response operation in order to model the probability of setting up a new business by university graduates. Such methods as 3-factors and 2-factors Logit and Probit Analysis are reviewed. The author used the database of the National Science Foundation as the source for the research about American graduates. Key words: logit regression, probit regression, function of binary response, 3-factors regression models, 2-factors regression models, econometric modeling, statistical modeling of the labor market. ———————————————————— 1. INTRODUCTION 2. LITERATURE REVIEW In econometrics the models using the processes with a This article is a logical continuation of the following articles binary response have recently come to the foreground of of the author: research interest. The conventional methods used for 1. Using the Comprehensive Confirmatory Factor implementing such models are logit and probit models. Analysis Method of Structural Equation Modeling in the They are relevant for assessing the current state of all Process of Graduates Employment [2]; sectors of the economy particularly related to the labour 2. Analysis Method of Structural Equation Modeling market of the qualified specialists with background higher [3]. education. These methods are considered in relation to this The presented articles deal with the use of sphere of the economy. Unfortunately, the data on the correlation and factor analysis to determine the factors qualified specialists with higher education in the Russian influencing the foundation of a new business. To confirm Federation are not sufficiently disclosed, which is necessary the results obtained, the methods of structural equations for carrying out this analysis. Therefore, the author used the were applied. As a result of the analysis, the best results database on specialists with higher education provided by were obtained using 3 and 2 factor models. The 3-factor the National Scientific Association of the USA. The model included the following factors: "Experience" (f1), aforesaid database uses 447 parameters for each "Attitude to Education and Science"(f2), "Business university graduate. In order to build a regression model Characteristics"(f3). The 2-factor model is based on the using the neural networks, it is not possible to use all the factors: "Experience and environment" (f1), "Business parameters. Data must be reduced. As a tool for selecting characteristics" (f2). To determine the number of factors the factors that affect the probability of establishing a used method scree. business by university graduates, we referred to the As a theoretical justification for logit and probit regression methods of correlation analysis, factor analysis and analysis, the following authors' works were confirmatory factor analysis methods including structural used:Christensen, R. [4], Finney, D. J. [5], Hosmer, D. W., equations. With the help of the methods of factor analysis, andLemeshow, S. [8], Cox, D. R., andSnell, E. J. [16], the 3-factor and 2-factor models were obtained. The 3- Greenland, S. [23], Hosmer. D.W.J. and Lemeshow, S. [26]. factor model included the following factors: "Experience" (f1), "Attitude to Education and Science"(f2), "Business 3. SCOPE, OBJECTIVES AND METHODS Characteristics"(f3). The 2-factor model is based on the Since logit and probit models are conventional models for factors: "Experience and environment" (f1), "Business implementing the functions with binary response we characteristics" (f2). In the context of our research we originally referred to these models. In order to estimate the assumed that the economic criterion of specialist's success parameters of the logit and probit models the following is setting up a business. methods have been used: Quasi-Newton Estimation Method Simplex Estimation Method Simplex and Quasi-Newton Estimation Method Hooke-Jeeves pattern moves Estimation Method Hooke-Jeeves and Quasi-Newton Estimation Method Rosenbrock pattern search Estimation Method ————————————— Rosenbrock and Quasi-Newton Estimation Method In general, the results of the aforesaid methods turned to be Aleksey V. Burkov, Mari State University, Russia, almost identical. Distinctions begin only after the fifth sign [email protected] after a comma. Nevertheless, we chose the method with Elena A. Murzina, Volga state university of technology, Russia, [email protected] the least standard errors of model coefficients. Originally, we examined the logit and probit models using 3 factors, and then 2 factors. 3005 IJSTR©2019 www.ijstr.org INTERNATIONAL JOURNAL OF SCIENTIFIC & TECHNOLOGY RESEARCH VOLUME 8, ISSUE 11, NOVEMBER 2019 ISSN 2277-8616 4. RESULTS AND DISCUSSION Method. The formula of regression dependence can be With regard to 3 factor logit models, the most exact method represented as follows: of parameter estimation is a Quasi-Newton Estimation exp(1.756156 0.5749033f1- 0.4341022f2 0.9252501f3) p( f 1, f 2, f 3) (1) 1 exp(1.756156 0.5749033f1- 0.4341022f2 0.9252501f3) The estimations of coefficients, standard errors, t-statistics, to education and a science» as it has the lowest module of the levels of probability for the coefficient estimation in a t-statistics. The obtained importance of the influence of model and some other statistical data are presented in factors is similar to the results of the correlation analysis [2]. Table 1. From Table 1 it is evident, that nearly all model It is noteworthy highlighting fact that the increase in such coefficients are meaningful, since all of them have a low parameters as «Experience» and «Characteristics of level of probability and a high level of t-statistics. Being business» results in the increase of self-employment based on t-statistics, we assume that in the obtained model probability factor as well, while the increase in the the factor «Characteristics of business» is the most parameter «Attitude to education and science» the self- important, the value of the module of its t-statistics is the employment probability factor decreases. highest, and the least important factor is the factor «Attitude Table 1 Coefficient estimations Const. B Factor 1 (f1) Factor 2 (f2) Factor 3 (f3) Estimate 1.756156 0.5749033 -0.4341022 0.9252501 Standard Error 0.1408642 0.1348108 0.1614367 0.1157518 t (565) 12.46701 4.264519 -2.688994 7.993396 p-level 0 0.00002348146 0.007378391 7.43419700E-15 -95%CL 1.479474 0.3101117 -0.7511915 0.6978936 +95%CL 2.032837 0.8396949 -0.1170129 1.152606 Wald's Chi-square 155.4264 18.18612 7.230688 63.89439 p-level 0 0.00002009077 0.007170485 1.35654000E-15 Odds ratio (unit ch) 5.790135 1.776959 0.647846 2.522499 -95%CL 4.390636 1.363577 0.4718041 2.009516 +95%CL 7.635719 2.31566 0.8895738 3.166435 Odds ratio (range) 10.94642 0.2307998 174.2136 -95%CL 3.635796 0.07908819 49.0223 +95%CL 32.9568 0.6735334 619.1135 Table 2 Correlation Matrix of Parameter Estimates Variances of parameter estimates were computed after rescaling MS error to 1. Const. B Factor 1 (f1) Factor 2 (f2) Factor 3 (f3) Const. B0 1.000000 0.221754 -0.372354 0.412939 Factor 1 0.221754 1.000000 -0.016390 0.073552 Factor 2 -0.372354 -0.016390 1.000000 -0.029271 Factor 3 0.412939 0.073552 -0.029271 1.000000 Table 3 Classification of Cases Odds ratio: 7.7307 Perc. correct: 81.02 % Pred.1.000000 Pred.0.000000 PercentCorrect 1.000000 33 87 27.50000 0.000000 21 428 95.32294 In order to check the accuracy of the model we refer to the correlation in a model is insignificant. The highest matrix of parameters correlation (Tab.2) and the matrix with correlation is observed between the constant and other the number of correct classification of cases (Tab.3), the parameters, but these do not exceed 0.45, therefore, can normal probability plot of residuals and the histogram of be deemed as insignificant. frequency distribution of residuals (Fig.1). The parameter 3006 IJSTR©2019 www.ijstr.org INTERNATIONAL JOURNAL OF SCIENTIFIC & TECHNOLOGY RESEARCH VOLUME 8, ISSUE 11, NOVEMBER 2019 ISSN 2277-8616 Fig. 1 Histogram of frequency distribution of residuals Considering the table of correct classifications, we can Hence, we can consider, that the model precisely describes conclude, that the model correctly describes 81.02 % of the considered process. Now we shall consider the 3 factor cases, from the general number of cases in which the result probit model. Unlike the logit model, we applied theoptimum of the self-employment was negative, 95.32 % has been method for estimation of parameters in the probit model, i.e. predicted correctly. The plots also prove high adequacy of themethod of Hooke-Jeeves pattern moves Estimation the constructed model. On the normal probability plot of Method. The formula of regression probit dependence can residuals, the residuals are allocated close to the straight be presentedas follows: line of the normal distribution, and the histogram of frequency distribution of residuals is similar to a curve of NP (p) =NP (1.02477 + 0.332166 f1 - 0.24980 f2 + normal distribution, provided the dependent variable is 0.530444 f3) (2) binary. The left and the right parts of the histogram resemble a curve of normal distribution, the left part, for 0, where, NP is normal probability. the right for 1. We considered the distribution of the The estimations of parameters of the model are provided in residuals for 0 and 1 because the observable value has Table 4. From the table below we can see that the values of values 0 or 1, and modeled functions are continuous and standard errors of probit model parameters are lower than have value from 0 up to 1.

The Use of Logit and Probit Regression Models in the Process of Graduates' Employment

Logit and Ordered Logit Regression (Ver

Diagnostic Plots — Distributional Diagnostic Plots

A User's Guide to Multiple Probit Or Logit Analysis. Gen

Generalized Linear Models

Bayesian Inference: Probit and Linear Probability Models

Week 12: Linear Probability Models, Logistic and Probit

Probit Model 1 Probit Model

9. Logit and Probit Models for Dichotomous Data

A Probit Regression Approach

POLO: a User's Guide to Probit Or Logit Analysis. Gen

Logit and Probit Models

Categorical Data Analysis Using a Skewed Weibull Regression Model