doi: http://dx.doi.org/10.5892/ruvrv.2012.102. 12 8135

Comparison between packages and Genstat ® in ordinary procrustes analysis for sensory profiling of food products

Eric Batista FERREIRA 1 Marcelo Silva de OLIVEIRA 2

1Professor do Instituto de Ciências Exatas. Universidade Federal de Alfenas. [email protected]. 2Professor do Departamento de Ciências Exatas. Universidade Federal de Lavras. [email protected].

Recebido em: 16/10/2012 - Aprovado em: 12/12/2012 - Disponibilizado em: 30/12/2012 ABSTRACT: In modern management of companies, the quantitative knowledge is determinant for defining marketing strategies and product images. Particularly for industries of products that strongly stimulate the human senses, the statistical methods that allow quantitative evaluations are joined under the science chunk called Sensometrics. One of those methods is the Ordinary or Classical Procrustes Analysis (OPA), that consists in finding the best way one matrix can be fitted to a reference matrix by minimizing the sum of squared distances. Since this methodology begun to be use in sensory evaluation many specific and non-specific software have been developed. The high costs involved to purchase such programs lead some researchers to look for free software. R is a free statistical and mathematical package that already has an implementation of OPA and GPA (Generalized Procrustes Analysis) in its Shapes package (a statistical analysis package). The aim of this paper is to interpret, describe and compare R output to a non-specific software, GenStat ®, normally used in sensory analysis. Under simulated data the software were compared and lead to same conclusions and interpretations. Therefore, package Shapes from R can be used in sensory analysis, rather than Statistical Shape analysis.

Key words: Ordinary Procrustes Analysis, software R, data simulation, Food products.

INTRODUCTION methods that allow quantitative evaluations In the modern management of are joint under the science chunk called companies, the quantitative knowledge of Sensometrics, which deals with several the consumer preferences is determinant categories of problems like consumer for defining marketing strategies and preference, sensory profile of a product product images, impacting also in the and assessors training (FERREIRA; definition of competitive advantage by OLIVEIRA, 2007). production (SLACK; CHAMBERS; One of the greatest needs of Food JOHNSTON, 1997). Further, knowing Sensorial Analysis, especially in dairy such preferences is strictly related to the industries, is a satisfactory statistical implemented programs and politics, aims analysis of the panelist’s scores. For these and objectives of the quality management reason, along the years, a lot of tools have of the company. been tested. The multivariate techniques Particularly for industries of had been demonstrated to be efficient to products that strongly stimulate the human provide good answers about consumer’s senses (food, cosmetics and perfumes preference and food quality (FERREIRA industry, among others), the statistical et al., 2008).

128 Revista da Universidade Vale do Rio Verde, Três Corações, v. 10, n. 2, p. 128-135, ago./dez. 2012 This paper deals with the particular ||sX 1Q - X 2|| . (1) problem of identifying the possible consensus of a trained assessor and an Here, Q refers to a orthogonal expert. It is reasonable to assume that if the matrix (SCHÖENEMANN, 1966), and s is assessor agrees with the expert, his/her a scaling factor that enables stretching or training can be considered satisfactory. shrinking the configuration (GOWER, Cheese sensory data is used here for such 1975). evaluation. In dairy industry is natural to One of the suitable multivariate suppose that an expert panelist train a techniques for such evaluation is the beginner. In order to verify the efficiency Procrustes Analysis. Particularly, the of the training an OPA must be performed Ordinary Procrustes Analysis (OPA) (or (fitting the trainee score matrix to the Classical Procrustes Analysis) is a expert score matrix) to evaluate the multivariate technique that is used when residual sum of squared distances between one want to find the best way one matrix these two matrices. Also, agreements and can fit another. For instance, how similar disagreements between the two panelists one score matrix is to another. While the can be identified in some two axes OPA allow to match just two score graphics (FERREIRA et al., 2007). matrices, with Generalized Procrustes There are a lot of specific and non- Analysis (GPA) we can match m matrices specific software that perform this analysis at the same time (GOWER, 1975). satisfactorily, but all of the commonly used The minimization problem of are so expensive. R is a free statistical and transforming one given matrix X1 by a mathematical program. It has some basic matrix, say Q, such that best fits a given packages (default) and many others target matrix X2 is called a Procrustes packages to be loaded in accordance with Problem since Procrustes was an evil the user needs (R DEVELOPMENT personage of the Greek Mythology which CORE TEAM, 2011). In R there isn’t a used to fit his guests in a so called magical package specific for Procrustes analysis, bed by sawing or stretching them. The but there is a package called Shapes , for term Procrustes Problem is due to Hurley statistical shape analysis, that brings OPA and Cattel (1962) that suggested for the and GPA algorithms implemented first time the problem of transforming one (DRYDEN, 2009). matrix into another by minimizing the Residual Sum of Squares (RSS)

129 Revista da Universidade Vale do Rio Verde, Três Corações, v. 10, n. 2, p. 128-135, ago./dez. 2012 GenStat  is statistical software that context. The reason we choose analyze just two attributes was to facilitate the building is able to perform the Procrustes analysis, of graphics. enabling many graphic tools (GENSTAT,

2003). Despite its high purchase cost, it is Data simulation commonly used in sensory analysis Graybill (1976) presents an easy (GAINS; THOMSON, 1990; MCEWAN; manner to generate data from a bivariate SCHLICH, 1992). Normal distribution ( X, Z ), where Z|X=x is This proposed situation was found by a linear relation of X.  performed in the programs R and GenStat This way, to simulate data from a (7 th edition) in order to compare the bivariate Normal distribution one has to outputs. We also built graphics with R and assume known five parameters, since got some practical information. (X, Z) ~ N 2(Z, X, σZZ , σXX , σZX ) (2)

where Z, X are the means of Z e X; σZZ ,

METHODOLOGY σXX the variances of Z e X and σZX the Ordinary Procrustes Analysis covariance between Z e X; and where

(OPA) consists in find the best way one σZX = ρσ ZσX. matrix fits a reference matrix by The linear relationship between Z e minimizing the sum of squared distances X is the mathematical expectation of the between the points (lines) in a fixed matrix conditional Z|X = x and a shifted and rotated to-be-fitted-one σ ZX (3) EZXx[|==+ ]µZ ( x − µββ X ) =+0 1 x (GOWER, 1975). σ XX A reference matrix (expert scores) was established and a data matrix (trained and with variance 2 assessor) was simulated in R (like V[Z|X = x ] = σZZ (1-ρ ) (4) demonstrated below). So, one value of the variable Z|X=x is It was considered four brands of given by Gorgonzola cheese, each line of the matrix Z|X= Z|X=x + e(x) (5) representing a brand. The judges scored 2 where e(x) ~ N(0, σZZ (1-ρ )) . these four samples at two attributes Data simulation was necessary only (appearance and characteristic flavor), to get the trained assessor's scores. We each column of the matrix representing an assumed that the trained assessor scores attribute. So, this simulated sensory were random values from four bivariate analysis occurred in a fixed vocabulary

130 Revista da Universidade Vale do Rio Verde, Três Corações, v. 10, n. 2, p. 128-135, ago./dez. 2012 Normal distributions ( X, Z ), where X is the Later, the configuration graphics appearance and Z is the characteristic were made utilizing R. Figure 1 presents flavor. The means of these four bivariate the configurations of the both assessors, Normal distributions are the expert scores. before and after Procrustes analysis. In The expert scores were arbitrarily Figure 1 one can observe how OPA established and it was considered a allowed a better view of the agreements positive correlation between the attributes and disagreements of such assessors. (appearance and characteristic flavor). We Looking at the right hand side of assumed the following covariance matrix Figure 1 one can observe that both 100 120  assessors agree more about brands 2 and 3 R =   120 225  than about brands 1 and 4. Considering Therefore, there was a positive first the appearance axis, for both correlation between attributes ( ρ=0,8) but assessors, the decreasing relation of brands not between cheese brands. is ordered like 4, 1, 2 and 3; but when one consider the characteristic flavor axis, the MAIN RESULTS decreasing relation of brands for expert is ordered like 4, 1, 2 and 3; and for the After data simulation, the trained assessor the relation of brands is established (A) and simulated matrices (B) ordered like 1, 4, 2 and 3. So, concerning (score matrices from expert and trained to characteristic flavor, they disagree in assessor, respectively) were: first and second positions. However, both

8082  7175  agree that brand 4 is the better whole     7072 and 6259 sample and brand 3 is the worst. A =   B =   3050  36 44      9095  7772 

131 Revista da Universidade Vale do Rio Verde, Três Corações, v. 10, n. 2, p. 128-135, ago./dez. 2012

(a) (b) FIGURE 1. Configurations of the both assessors: expert (1a, 2a, 3a, 4a) e trained assessor (1b, 2b, 3b, 4b), before (left) and after (right) Ordinary Procrustes Analysis.

One can compare the R and GenStat  In Table 1 one also can note that the both software numerical results are outputs analyzing Table 1. This table identical, what suggests that such R shows the element name given by each algorithm is the same algorithm software and the numerical result of the implemented in GenStat . Hence, the matrix, scaling factor, fixed configuration A, Procrustes fitted Shapes package OPA algorithm can be configuration B, and Ordinary Procrustes used in food sensory analysis without loss. residual sum of squares.

132 Revista da Universidade Vale do Rio Verde, Três Corações, v. 10, n. 2, p. 128-135, ago./dez. 2012 TABLE 1. Comparison between R and GenStat  outputs for Ordinary Procrustes Analysis. Software’s element names and numerical results of , scaling factor, fixed configuration A, Procrustes fitted configuration B, and Ordinary Procrustes residual sum of squares. R GenStat 

Name $R Orthogonal Rotation Rotation matrix 0.99980923 -0.01953201  0.99981 -0.01953      0.01953201 0.99980923  0.01953 0.99981  Name $s Least-squares Scaling factor Scaling factor 1.375420 1.3754

Name $Ahat Xout Fixed configuration A 12.5 7.25  12.50 7.25      2.5 -2.75 2.50 -2.75     -37.5-24.75  -37.50-24.75      22.5 20.25  22.50 20.25  Name $Bhat Yout Procrustes fitted 13.3998104 16.934260  13.40 16.93      configuration B 0.5935525 -4.826485 0.59 -4.83     -35.5635272-24.755373  -35.56-24.76      21.5701643 12.647598  21.57 12.65  Name $OSS Sum of squares Ordinary Procrustes 164.9520 Fitted Configuration 2992.7980 residual sum of Residual 164.9520 squares ------Fixed Configuration 157.7500

133 Revista da Universidade Vale do Rio Verde, Três Corações, v. 10, n. 2, p. 128-135, ago./dez. 2012 FINAL REMARKS University – UK) and the financial In the proposed dairy context agencies Capes and CNPq. situation, brands 2 and 3 were better- characterized, i.e. expert and trained REFERENCES assessor better agreed brands 2 and 3. They also agreed that the better brand is number DRYDEN, I. Shapes: Statistical shape analysis . R package version 1.1-3. 2009. 4 and the worst is brand 3. Concerning to http://CRAN.R- appearance, the panelists agreed about the project.org/package=shapes. order of the brands; and concerning to FERREIRA, E.B.; MAGALHÃES, F.A.R.; characteristic flavor, they didn’t agreed OLIVEIRA, M.S.; FERREIRA, D.F. Sensory profile of the gorgonzola type about two positions. cheese through bootstrap confidence R and GenStat  outputs were regions in generalized Procrustes analysis. Revista do Instituto de Laticínios numerically identical, but with different Cândido Tostes , v. 62, p. 535-540, 2007. element names (notations). Hence, R is a FERREIRA, E. B.; OLIVEIRA, M. S. useful tool to perform Ordinary Procrustes Sensometria : uma abordagem com ênfase em Procrustes . Santa Maria: UFSM, 2007. Analysis in food sensory analysis context, 71p. leading to the same conclusions that FERREIRA, E. B.; OLIVEIRA, M. S. de; GenStat  and also enabling graphics FERREIRA, D. F.; MAGALHÃES, F. A. R. Sensory Profile of Gorgonzola via building. Generalized Procrustes Analysis Using R. Boletim do Centro de Pesquisa e Those quantitative methods will Processamento de Alimentos , v. 26, p. allow managers of companies of products 151-159, 2008. highly dependent of human senses to GAINS, N.; THOMSON, D.M.H. Sensory objectively analyze their data, leading to profiling of canned lager beers using consumers in their own homes. Food more secure decisions, avoiding Quality and Preference , v. 2, p. 39-47. subjectivity of opinions, which can put 1990. under risk the goal of an management GENSTAT. Genstat for Windows , action in a competitive market culture. Release 7.1.0.198, 7th edn. Oxford, UK: VSN International Ltd. 2003.

ACKNOWLEDGMENTS GOWER, J.C. Generalized Procrustes Analysis. Psychometrika , v. 40, p. 33-51. We would like to thank Professor 1975. Ann Noble (University of California), professor John Clifford Gower (Open

134 Revista da Universidade Vale do Rio Verde, Três Corações, v. 10, n. 2, p. 128-135, ago./dez. 2012 GRAYBILL, F.A. Theory and Application of the Linear Model . Duxbury Press:Boston, Massachusetts. 1976. 703 p.

HURLEY, J.R.; CATTLE, R.B. The Procrustes program: producing direct rotation to test a hypothesized factor structure. Behavioral Science , v. 7, 258- 262. 1962.

MCEWAN, J.A.; SCHLICH, P. Correspondence Analysis in sensory evaluation. Food Quality and Preference . v. 3, p. 23-36. 1992.

R DEVELOPMENT CORE TEAM. R: A language and environment for statistical computing . R Foundation for Statistical Computing, Vienna, Austria. 2011. URL http://www.R-project.org.

SCHÖENEMANN, P.H. A generalized solution of the Orthogonal Procrustes Problem. Psychometrika , v. 31, n. 1, p. 1- 10. 1966.

SLACK, N.; CHAMBERS, C. H.; JOHNSTON, R. Administração da Produção. São Paulo: Atlas. 1997. 726 p.

135 Revista da Universidade Vale do Rio Verde, Três Corações, v. 10, n. 2, p. 128-135, ago./dez. 2012