Available online at www.sciencedirect.com ScienceDirect

Procedia Economics and Finance 10 ( 2014 ) 134 – 140

7th International Conference on Applied Statistics On a case of inadequacy in using the arithmetic mean

Irina-Maria Dragana*

a University of Economic Studies, Statistics and Econometrics Department, Virgil Madgearu Building, Calea Dorobantilor No.15-17, 6th floor, Sector 1, Bucharest 010552, Romania

Abstract

In strong heterogeneous populations, the use of the average indicator is not appropriate in order to summarize the values, because this involves a relatively homogeneous population and a normal, or approximately normal, distribution around the central value. In the case of some genuine economic processes, there is any possibility to apply even the homogenization process of data, by taking into consideration the string boundaries as outliers’ values, as it can be proceed with a of measurements. In such heterogeneous populations, the characteristic respect a , which is somewhat similar to Gauss-Laplace distribution, however it has more elongated tails and a specific density of the distribution. For these cases it is appropriate as the middle value to be established through the median. The case study, conducted and released in this paper, refers to the financial performance of SMEs. This population is characterized by a strong heterogeneity, so in this case the merger at the level of the population is risky by using the mean, because it generates distorted indicators of the shaped reality. The obtained results in this research might represent a guide for the study of situations in which there are heterogeneous populations, moreover where it is not possible to clean their extreme values.

© 2014 Elsevier B.V. This is an open access article under the CC BY-NC-ND license © 2014 The Authors. Published by Elsevier B.V. (http://creativecommons.org/licenses/by-nc-nd/3.0/). Selection and peer-review under responsibility of the Department of Statistics and Econometrics, Bucharest University of Selection and peer-review under responsibility of the Department of Statistics and Econometrics, Bucharest University of Economic Studies.

Key words: Cauchy distribution; heterogeneous populations; SMEs; arithmetic mean (average).

* Corresponding author. Tel.:+40723408598. E-mail address: [email protected].

2212-5671 © 2014 Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/3.0/). Selection and peer-review under responsibility of the Department of Statistics and Econometrics, Bucharest University of Economic Studies. doi: 10.1016/S2212-5671(14)00286-X Irina-Maria Dragan / Procedia Economics and Finance 10 ( 2014 ) 134 – 140 135

1. Introduction

The private sector of the economy is dominated by small and medium enterprises (Nicolescu et al., 2012), a fact highlighted by the structure of all the enterprises that submitted and been approved balance sheet statements (table 1). Table 1 Company's distribution by size Average number of employees Total 0-9 10-49 50-249 Over 250 56,417 42,197 7,912 1,552 613,078

Of the total of 613,078 active enterprises, a number of 601,428, representing 98.10%, achieved a turnover of up to 2 million euros and only 11,650 firms have a turnover exceeding this ceiling, increase of 4.5% compared to the previous year. Of enterprises with turnover of up to 2 million euros, 93.1% are micro-enterprises, with up to 10 employees, and 99.97% are SMEs. In this sector, but also in many other cases of social and economic practice, there are situations distinguished by a great heterogeneity, in which neither Gauss-Laplace distribution assumption can be accepted, nor homogeneity can be achieved by treating the extreme values as outliers and removing them. In the case of SMEs, but also in others in which it is not acceptable to establish the average value as the arithmetic mean and which assumed a of values, can be used the substitution variants of the normal law, from which an apparently peculiar model is proposed by Augustin Cauchy, who developed an earlier proposal made by Maria Agnesi.

2. Substituting Gauss-Laplace distribution and the central value calculation

The proposal of Maria Agnesi (Dauben and Scriba, 2002) came into literature under the name “Maria Agnesi loop” or “Witch of Agnesi” or as she called it “la Versiera”. The curve has the following equation yabb222/( x ), a,b>0, x  R . (1) If a = b, the simpler form: ya322/( a x ), a>0, x  R . (2) The Italian mathematician has deduced the curve in the following manner: a variable line (d), which passes through the origin, cut the circle x2 + y2 = ay in a point A and the line y = a in a point B. The point M of intersection of the lines AM parallel to axis OX and BM parallel to axis OY describes the curve ya 22 x a 3 that is the curve versiera. Cauchy reconsidered the proposal of Maria Agnesi as:

a2 Ax, a,b>0, x \ (3) bx22

Agnesiana is positive, A(x) > 0 whatever x \ , it is symmetrical to 0y axis, has two inflections and the axis Ox is horizontal for curve y = A(x) to both minus and plus infinity limAx   0 . x The graphics similarity to Agnesi loop with a normal density is enticing but also generating potential confusions. Cauchy turned A(x) in a probability density function, considering ab1 and inserting a control factor:

11 Ax* , x \ (4)  1 x2

In this way, A*(x) has become a function of frequency Ax*    0 and  Axdx*    1 . R 136 Irina-Maria Dragan / Procedia Economics and Finance 10 ( 2014 ) 134 – 140

However, the major problem of this density is that the theoretical average value is indefinite as

1121 xx E x A* x dx dx  dx ln 1 x2  ! (5)   22   R  11xx22

As a result, the credibility of the Cauchy distribution, as a substitute for the normal law, is void precisely by this property, even if visually we see that the mean would be zero, as well as the median and modal value. Because the mean for Cauchy variable cannot be establish with the formula of the Central Limit Theorem, this does not apply. The mean of n variables assigned to Cauchy, is not normal but Cauchy. The „Cauchy ”, as it was called later (Smithies, 2008), has the simple density:

11 Xfx:;,   1/x 2 2  (6)

where x \\,  şi   0 . Here the parameters  and  do not have the meanings that they acquire in the case of Gauss-Laplace's law:  here is a position parameter and  is a simple real scale parameter. In fact, if we take   0 and  1 we obtain uncomplicated form:

11 \ fx2 , x   1 x (7)

which is indeed a density, whereas fx() 0 and

 11  1 dx arctgx 1 (8)  2   1 x  12

The distribution function has the form

2 11x u   1x   2  F x;,  1 2 du  arctg  (9)   2     

or if   0 and  1, than

11 XFx:  arctgx (10) 2 

If we compute the mean of X namely

 11 xdx  EX xdFxln 1 x2 (11)   2     x 21 

so the mean does not exist, although practically it can be distinguish on the loop as being zero. But as the theory does not allow writing E(X), the location parameter  will be estimated using the median. Moments of order r, determine such Irina-Maria Dragan / Procedia Economics and Finance 10 ( 2014 ) 134 – 140 137

 1  xr EXrr xdFx dx (12)  2    1 x

where r=2,3…., are infinite. For example if r=2

11 xdx2   1 E X2  dx dx x 1  (13)  22  11xx  

An important property of Cauchy’s law is that the ratio of two independent random normally distributed variables, zero mean, follows a Cauchy distribution with   0 and  1 (Johnson et al., 1994). 2 2 Furthermore, if X and Y are respectively XN (0, x ) and ZN (0, z ) , then the ratio X/Y follows a Cauchy distribution with   0 and  xz/ . The Cauchy distribution has been rediscovered along with the creation of the well-known today Student distribution (the t-distribution) in 1908 by the British chemist and statistician William Sealy Gosset (Pearson, 1990):

n 1   2 n 1/2 2 t Tft:1 (14) n n  n  2

where tn\`, * ,  is Gamma function, with (1) 1 and (1 / 2)  . For n =1 we obtained the Cauchy distribution (Neubrander, 1984). Student variable has zero mean E(t) = 0 and the variance Var t  n/2 n  doesn’t make sense but only for n  2 . Gradually, the statistics literature has been enriched with other contributions and inferential applications of this distribution (McCullagh, 1992; Osu and Ohakwe, 2011; Carrillo et al., 2010; Arnold and Beaver, 2000). Because the Cauchy density tails are much broader, compared to the density of the normal distribution, this model can be used to study the variables with extreme values, the same as our case concerning the economic and financial performance of SMEs.

3. The profitability performances of the SMEs in Romania

The annual balance sheet data indicate, for the SMEs sector of the economy, a high heterogeneity, the recorded values being placed in an area of particularly large amplitude, more pronounced for the financial performance indicators (Isaic-Maniu and Dragan, 2010). For example, considering for the first and the last five NACE activities (totally, there are 88 NACE activities in which operate the SMEs sector) the values of profitability ratios (computed as a percentage ratio between net result and the turnover) were between 74.57%, for real estate transactions, and 0.42% in the manufacture of tobacco products (Dragan, 2012). Analyzing the data for all 88 profitability ratios values indicates a high heterogeneity as against a mean of 8.72 %, variance of 91.84 %, standard deviation of 9.58 %, and standard error of 1.022. The distribution shape indicate major deviations from the standard model of Gauss-Laplace as the Skewness coefficient result 4.504 and the kurtosis 26.498, Q1=4.27% and Q3=9.14%, so only 25% of companies have a rate of commercial profitability over 9.14%, while only 5% of them have this rate over 26%. All descriptive statistics values generate doubts on the subject of the normality of this distribution. The use of Chi-square test leads to calculated:

k 2 2 ()nnpii c   (15) i1 npi 138 Irina-Maria Dragan / Procedia Economics and Finance 10 ( 2014 ) 134 – 140

where ni represents the experimental frequencies and - the theoretical probabilities. 2 Ensue in the final, c  74.18 an amount over the critical value, for the most common values of risk I ( 2 2 0.2  7.28 respectively 0.01  15.09 ), and the P-value 1.3767E-14, thus reject the hypothesis of normality, therefore calculating the central value using the arithmetic mean is improper. In this case, as all data are real, originates from SMEs balance sheet and for the study is important the entire data set, it is not possible to eliminate the extreme values as outliers. The distribution shape suggests a Cauchy distribution (figure 1), so we proceed to test this hypothesis. The 2 2 computed value of the test c  4.06 is lower than the critical level for the most common risks I ( 0.2  8.56 , 2 2 0.05  12.59 , respectively 0.01  16.81), so we can consider as a valid the hypothesis on Cauchy distribution. In order to validate the results, have been used Kolmogorov-Smirnov and Anderson-Darling tests, which corroborated the decision of acceptance. The probability density function of the Cauchy distribution (6) has the parameters as location parameter   6.1877 , respectively the real scale parameter  2.0704 , so:

11 fx( ;6.1877, 2.0704)  2 (16) 2.0704 x  6.1877 / 2.07042

Fig. 1. The Profitability Rates for SMEs

2 11 ux6.1877 16.1877 F( x ;6.1877,2.0704) 1 du  arctg  (17) 2.0407 2.07042 2 2.0704 

Since the variable x average in the Cauchy distribution is undetermined (by calculation, generates a meaningless form), the location parameter (center value) will be estimated using the median, which for this case is 6.455 , a value that is far from the mean calculated for the 88 values (8.722).

4. Conclusions

The difference between the average for the 88 values of the profitability rate (8.7224) and the median, suitable for characterizing the central value in the case of Cauchy distribution (6.455) is high (error is 35.12%), a feature Irina-Maria Dragan / Procedia Economics and Finance 10 ( 2014 ) 134 – 140 139 which negatively affect the conclusions of some strategic studies and programs that would target SMEs sector and profitability. If we consider two hypothetical limits of profitability rate: for example, a relatively moderate profitability limit, located at 3%, and a high limit, optimistic, with two versions 15% (figure 2), respectively 30%. In order to determine the cover probabilities for SMEs in these areas of profitability, we obtain, as a result of calculations, the following: in both cases the likelihood for profitability rates (PR) to be less than 3% is 18.34%; the exceeding this 3% threshold of profitability rate gets 81.67% chance; placement between the two limits is 74.3% likely, in the first case (3%15%) and 2.76% for the second case (PR>30%). Additionally, was tested the assumption that the profitability rate to be negative and for -3% it comes out a probability of 7.06%.

Fig. 2. The Probability Density Function and the specific limits of 3% and 15%

Entire outcome indicate a high heterogeneity of the SMEs sector in Romania, due to lower economic performances, compared to other EU countries. On the subject of the indicators “birth”, “mortality” and “survival” for SMEs, the situation is relatively similar as in Europe. The situation described in the paper, regarding the SMEs sector, illustrates an approach for solving the problems concerning the issues of empirical data heterogeneity and of a non-Gaussian distributions, not a homogenization of the data by excluding the outliers, but using a distribution that takes into account all the values and provides a central value more reasonable, with positive consequences for the decision making process.

References

Arnold, B. C., Beaver, R. J., 2000. The skew-Cauchy distribution, Statistics & Probability Letters, 49(3), 285–290. Carrillo, R. E., Aysal, T. C., Barner, K. E., 2010. A Generalized Cauchy Distribution Framework for Problems Requiring Robust Behavior, EURASIP Journal on Advances in Signal Processing, 2010:312989. Dauben, J. W., Scriba, C. J., 2002. Writing the history of : its historical development, Basel, Boston and Berlin, Birkhäuser Verlag. Dragan, I., 2012. Performance of Small and Medium Enterprises. An analysis based upon annual balance sheets in Romania, LAP, Germany Dragan, I. M., Isaic-Maniu, A., 2012. The entrepreneurship impact on the dynamic of macroeconomic results, in Procedia Economics and Finance 3, Elsevier, 515-520. Isaic-Maniu, A., Dragan, I., 2010. SME’s Performance Evolution – Consequence of Improving Management, The fifth International Conference “Business Excellence” ICBE, Brasov. Johnson, N., Kotz, S., Balakrishan, N., 1994. Continuous univariate distributions, vol. 1, Wiley-Interscience Publication. McCullagh, P., 1992. Conditional inference and Cauchy models, Biometrika, 79, 247–259. Neubrander, F., 1984. Well-posedness of abstract Cauchy problems, Semigroup Forum, 29, 74-85. 140 Irina-Maria Dragan / Procedia Economics and Finance 10 ( 2014 ) 134 – 140

Nicolescu, O., Isaic-Maniu, A., Dragan, I., 2012. White Charter of Romanian SMEs, Editura Sigma, Bucharest Osu, B. O., Ohakwe, J., 2011. Financial Risk Assessment with Cauchy Distribution under a Simple Transformation of dividing with a Constant, Theoretical Mathematics & Applications, 1(1), 73-89. Pearson, E. S., 1990. Student - A Statistical Biography of William Sealy Gosset, Oxford University Press, USA. Smithies, F., 2008. Cauchy and the creation of complex function theory, Cambridge University Press, New York.