THE SKEW-NORMAL DISTRIBUTION IN SPC∗
Authors: Fernanda Figueiredo – CEAUL and Faculdade de Economia da Universidade do Porto, Portugal ([email protected]) M. Ivette Gomes – Universidade de Lisboa, FCUL, DEIO and CEAUL, Portugal ([email protected])
Abstract: • Modeling real data sets, even when we have some potential (as)symmetric models for the underlying data distribution, is always a very difficult task due to some uncon- trollable perturbation factors. The analysis of different data sets from diverse areas of application, and in particular from Statistical Process Control (SPC), lead us to notice they usually exhibit moderate to strong asymmetry as well as light to heavy tails, which leads us to conclude that in most of the cases, fitting a normal distri- bution to the data is not the best option, despite of the simplicity and popularity of the Gaussian distribution. In this paper we consider a class of skew-normal models that include the normal distribution as a particular member. Some properties of the distributions belonging to this class are enhanced in order to motivate their use in applications. To monitor industrial processes some control charts for skew-normal and bivariate normal processes are developed, and their performance analysed. An application to a real data set from a cork stopper’s process production is presented.
Key-Words:
• Bootstrap control charts; Control charts; Heavy-tails; Monte Carlo simulations; Skew- ness; Skew-normal distribution; Statistical process control.
AMS Subject Classification:
• 62E15, 62F40, 62F86, 62P30, 65C05.
∗Research partially supported by National Funds through FCT—Funda¸c˜aopara a Ciˆencia e a Tecnologia, project PEst-OE/MAT/UI0006/2011. 2 Fernanda Figueiredo and M. Ivette Gomes Monitoring industrial processes with robust control statistics 3
1. Introduction
The most commonly used standard procedures of Statistical Quality Con- trol (SQC), control charts and acceptance sampling plans, are often implemented under the assumption of normal data, which rarely holds in practice. The anal- ysis of several data sets from diverse areas of application, such as, statistical process control (SPC), reliability, telecommunications, environment, climatology and finance, among others, lead us to notice that this type of data usually ex- hibit moderate to strong asymmetry as well as light or heavy tails. Thus, despite of the simplicity and popularity of the Gaussian distribution, we conclude that in most of the cases, fitting a normal distribution to the data is not the best option. On another side, modeling real data sets, even when we have some po- tential (as)symmetric models for the underlying data distribution, is always a very difficult task due to some uncontrollable perturbation factors.
This paper focus on the parametric family of skew-normal distributions introduced by O’Hagan and Leonard (1976), and then investigated with more detail by Azzalini (1985, 1986, 2005), among others.
Definition 1.1. A random variable (rv) Y is said to have a location- scale skew-normal distribution, with location λ, scale δ and shape parameter α, being denoted Y ∼ SN(λ, δ2, α), if its probability density function (pdf) is given by 2 y − λ y − λ (1.1) f(y; λ, δ, α) = φ Φ α , y ∈ (α, λ ∈ , δ ∈ +), δ δ δ R R R where φ and Φ denote, as usual, the pdf and the cumulative distribution function (cdf) of the standard normal distribution, respectively. If λ = 0 and δ = 1, we obtain the standard skew-normal distribution, denoted SN(α).
This class of distributions includes models with different levels of skewness and kurtosis, apart from the normal distribution itself (α = 0). In this sense, it can be considered an extension of the normal family. Allowing departures from the normal model, by the introduction of the extra parameter α that controls the skewness, its use in applications will provide more robustness in inferential methods, and probably, better models to fit the data, for instance, when the empirical distribution has a shape similar to the normal, but exhibits asymmetry. Note that even in potential normal situations there is some possibility of having disturbances in the data, and the skew-normal family of distributions can describe the process data in a more reliable and robust way. In applications it is also important to have the possibility of regulating the thickness of the tails, apart of the skewness.
The cdf of the skew-normal rv Y defined in (1.1) is given by y − λ y − λ (1.2) F (y; λ, δ, α) = Φ − 2T , α , y ∈ (α, λ ∈ , δ ∈ +), δ δ R R R 4 Fernanda Figueiredo and M. Ivette Gomes where T (h, b) is the Owen’s T function (integral of the standard normal bivariate density, bounded by x = h, y = 0 and y = bx), tabulated in Owen (1956).
Although the pdf in (1.1) has a very simple expression the same does not happen with the cdf in (1.2), but this is not a problem that lead us to avoid the use of the skew-normal distribution. The R package ’sn’ (version 0.4-17), developed by Azzalini (2011), is now available. In such a package, functions related to the skew-normal distribution, like the density function, the distribution function, the quantile function, random number generation and maximum likelihood estimates are provided.
The moment generating function of the rv Y is given by