The issue of best practices in science: how to express measurement uncertainties

Hervé This, vo Kientza

1 UMR Ingénierie Procédés Aliments, AgroParisTech, Inra, Université - Saclay, 91300 Massy, 2 Groupe de gastronomie moléculaire, Inra-AgroParisTech International Centre for Molecular Gastronomy, F-75005, Paris, France

Correspondence : herve.this@.fr

This document is a personal translation of this article, published in French in the N3AF Journal: La question des bonnes pratiques en sciences de la nature : comment exprimer des incertitudes de mesure Notes Académiques de l'Académie d'agriculture de France (N3AF) 2016, 4, 1-8

Abstract : The scientific community could usefully adopt best practices for the various steps of the scientific method. Here a numerical example is used in order to discuss such rules for the composition of uncertainties. Keywords : Uncertainties, best practices, GUM, composition, Monte-Carlo

Introduction In the Lettre de l'Académie d'agriculture de France (This, 2015), the issue of best practices for scientific research was discussed. Indeed, for activities where the obligation of results is not pertinent, such as medicine, there is instead an obligation of means, and best practices have to be used (HAS, 2016). For sciences of nature, the obligation of results seems difficult to impose, so that best practices should be given. However it can be observed that the rules given by scientific institutions, such as academies or research institutes, discuss primarily the methodology of analysis, with quality, traceability, validity of researches (Inra, 2016a), or ethics (Inra, 2016 b) and ethics publishing (Wiley, 2016). The question of calculation is not discussed much. However all steps in scientific research could be concerned, i.e. the identification of phenomena, quantitative characterization of phenomena of interest, making synthetic laws from these data (equations), inducing theories, or models, including proposing mechanisms, research of refutable theoretical predictions, and experimental tests of these predictions (This, 2011). Here we restrict the discussion to the issue of expressing the uncertainties of results, of measurements and calculations, in order to stress the procedures currently admitted by the scientific community. We do not intend to paraphrase the official documents, (Eurachem/CITAC Guide CG4, 2012 ; River et Balère, 2003), but one goal is to show, with one simple example, the interest of the creation of a « Guide of best scientific practices », by a scientific institution that could be Inra, CNRS, the French Academy of agriculture, or the French Academy of sciences.

Composing uncertainties

In particular, the quantitative characterization of phenomena, which is the second step of scientific research (Bacon, 1620), leads to measurement results that common sense and best practices should validate (ACS, 1980). The repetition of experiments and measurements ("validation") is almost accepted as an obligation, but the production of results (generally different results in repetitions of experiments) has to be followed by their comparison, and this is why one has to calculate the dispersion of values (This, 2013). For this calculation, various methods are possible (Saporta, 2006), depending on the experimental conditions. A first method consists of repeating an experiment entirely (it can be complex, with a series of many elementary steps) and to calculate an estimator of the standard deviation sn-1 of the x value of the experiment (for a dosage, for example) :

(1)

Here n is the number of repetitions of the experiment, xi the value found at the i-th repetition, and the mean of the n values xi, i=1..n is ¯x (Harris, 2007). However this method cannot always be used, in particular when the experiment destroys the samples which are quantitatively characterized. In order to determine the uncertainty of the parameter of interest x, one has to « propagate uncertainties » (or « compose uncertainties”) using a determination of elementary uncertainties, for each of the steps of the experiment. Let us consider for example the very elementary case of the preparation of a solution, by dissolution of a mass m of a solute (obtained by mass measurement) in a mass M of solvent (also determined by weighing), and for which the desired parameter is the mass concentration c = m/M (it has to be noted that the practice of weighing solutes and solvents in order to determine concentration -mass concentrations- is much better than the measurement of volumes, because the precision of balances is generally higher than the precision of micropipettes; a proof of this proposal is that the control of pipettes is done using balances). As the masses m and M are uncertain (the uncertainty can be estimated by the standard deviation of successive mass measurements, or by the precision of the balance when the dispersion of mass measurements is lower than the precision), the mass concentration c is also uncertain, and the issue is to express the uncertainty Δc of the mass concentration c as a function of the uncertainties Δm and ΔM relatively respectively to m and M. Another example is also common during chemical analysis : when the area A of a signal (for example the area of a chromatographic or spectroscopic signal) is determined, then using a calibration curve of equation A = a . c + b gives the possibility to find the concentration c using the equation . Because the areas are known with some uncertainty, one has to express the concentration c along with an uncertainty that has to be determined from the uncertainties n the calibration curve (expressed by the coefficients a and b) and from the area A. In all cases, one has to consider that the parameter of interest (mass concentration, concentration of a solution analyzed by spectroscopy...) is the value of a function of many variables, and the question of propagating uncertainties is the following : given the uncertainties of the variables, how can we determine the uncertainty on the value of the function?

The answer from the scientific community and two faulty practices

There are many ways of characterizing variability, and each is legitimate if it is rational and explicit. In particular, for the question of propagating uncertainties, a collective answer was put forward by the Bureau international des poids et mesures (BIPM): the Joint Committee for Guides in Metrology (JCGM) of the BIPM published a document entitled Évaluation des données de mesure — Guide pour l’expression de l’incertitude de mesure (GUM )(JCGM, 2015) that describes very generally best practices for the determination of measurement uncertainties and composition of uncertainties. It is not difficult to follow such rules conventionally decided, but the goal here is to show on a simple numerical example how they work and which mistakes have to be avoided, because teaching at the university demonstrates that many students understand better when examples are given, before general cases are shown (this idea remains however to assess quantitatively). The limits of the application of the rules will be given. Let us consider, for example, the determination of the uncertainty of a mass concentration c, from the uncertainties of the masses m and M respectively of the solute and the solvent. These uncertainties can be given by an experimental standard deviation or by the precision of the measurement tool, but it is formally exact that the mass concentration c is a continuous and derivable function of the two variables m and M, so that:

(2) And here:

(3) In such analytical expressions, differentials are infinitely small quantities, contrary to uncertainty, which is finite (Piskounov, 1980). Documents published before the GUM proposed to move from differential to uncertainties using the expression:

(4) This expression corresponds to a « Manhattan distance » (Verley, 1997), and the absolute values take into account the fact that errors are random in plus or minus. However the JCGM decided rather for the expression using the Euclidean distance:

(5) It can be observed that these two practices are close, but use different distances. Moreover the choice of the JCGM can be used only if the model of the measurement process does not show any significant non linearity. Dispersions have to be small for each of the variables implied in the measurement process, of the same order of magnitude, and with symmetrical distributions. When these hypothesis are not verified, the JCGM proposes to implement a Monte Carlo method, as described in a supplement to the first document (JCGM, 2008 ; Robert et Casella, 2010). Here the principle is not to propagate the uncertainties using the model, but to use a probability density function for the input parameters. This function being known or supposed for each input value, the Markov formula can be used (Monte-Carlo methods with Markov chains consist in generating a vector xi solely from the values of the vector xi-1 ; they are processes with no memory). Contrarily, one should avoid two practices which are sometimes observed. First, one should avoid to:

- measure some values for each variable (for example, m1, m2, m3, and M1, M2, M3) ; - then calculate the values of the function relationship for pairs of values, such as :

- determine a standard-deviation from such pairs of values. Indeed concerning this wrong practice one can observe immediately that there is no particular reason for the association of a particular mi to the corresponding Mi rather than another Mj, j ≠ i). This practice corresponds to the calculation of a value s'(c) :

(6)

One should also avoid the following practice:

- measurement of many values for each variable (for example, m1, m2, m3, and M1, M2, M3) ;

- then determination of all the crossed values mi/Mj ; - and calculation of the standard deviations from these crossed values. A numerical comparison for education

In this paragraph, let us first indicate that the French students of the level « bachelor » generally learn to compose variables only for sums, differences, products and ratios : for sums and differences, they learn that the uncertainty is the sum of variables; for products and ratios, they learn to use the “logarithm rule”. However when expressions are mixing sums, differences, products and ratios and also other analytical functions, such technique cannot be used. On the other hand, it can be shown that for the two practices discussed above with are not in line with the rules of the JCGM, variabless are under-estimated: in what follows, the numerical results obtained for the determination of a mass concentration are calculated in the various cases. For this comparison, 3 values for the mass of a solute and 3 values for the mass of solvent were determined using a balance Mettler Tolado AG153: - for m : 1.0001; 1.0002; 1.0002 - for M : 10.0001; 10.0001; 9.9998. For the two variables m and M, the average values (in g) are respectively moym = 1.00017 and moyM = 10.0), and the standard-deviation are respectively ecartpm = 6.10-5 and ecartpM = 2.10-4). Calculations were performed using the software Maple 18 (Maplesoft, Waterloo Maple Inc, Ontario, Canada). Using the rule given in the GUM, the standard-deviation for the mass concentration is:

(7 ) For this estimation of the uncertainty, the numerical value is 6.10-5. Using a Manhattan distance (Verley, 1997), one would have instead:

(8) for which the numerical value is also 6.10-5. One can observe that, with the rules for rounding up the standard-deviation (Taylor, 1997), the two values are equal. Using the Monte-Carlo method, an assumption has to be made explicitly on the distribution of the values of m and M, because one has to select at random values and calculate the corresponding values of c, before calculating the standard-deviation. The GUM indicates the specific conditions for the application of this technique, but one can observe that, for common measurements, the central limit theorem (Suquet, 2005) indicates that a noise based on many different physical independent factors is following a normal distribution. This third method begins with the creation of an experimental Gaussian population of average 1.00017 with a standard-deviation 0.00006 pour m. Then another experimental population of average 10.0, standard deviation 0,0002 is made. At random, a large number N of values m and M are selected (according to the GUM, more than 106 selections have to be made), so that an average concentration meanConc and a standard-deviation sdConc are calculated. On this example, it can be observed that the value that one can calculate using this method (6.106) is lower by one order of magnitude than the values that were calculated previously. This is bad from two points of view. First the calculated standard-deviation is not giving the right order of magnitude of the real standard deviation, and also if another distribution had been used another value would have been obtained: for example, for a uniform distribution, the standard-deviation is 1.10-5, and this is the reason why the JCGM recommends an entropic optimization step. This example also shows how careful one should be in using the Monte-Carlo method: one has to be very careful about the use of the distributions of values (Bédiat, 2006). In particular, one should select a distribution which is not giving more information than one has really: if the only information for a variable is a minimum or a maximum value, a uniform law has to be used. Let us now compare the values that we got with the first faulty method. The standard-deviation would be 7.10-6, lower than with the best practice. For the second faulty method, using 9 values of the mass concentration from all pairs of values for m and M, the result (5.10-6) is again smaller by one order of magnitude than the value obtained using the best practice. For these two methods, one could consider that they are not worse than the value given by the Monte-Carlo method without entropic optimization, but the JCGM indicates absolutely to make such an optimization. And any method giving low values should be rejected, as the standard deviation is already an estimation of the real dispersion.

In order to show the fact that the standard-deviation is an estimator of the dispersion whose value is only the order of magnitude (what follows can be mathematically demonstrated, see Verley, 1997), let us create a normal distribution around a value 100, with a standard-deviation equal to 1 ; then, 200 times, let us select at random 3 samples (a common laboratory practice for mass measurements) and let us calculate the standard-deviation in each case. The plot of the 200 standard-deviations is show in Figure 1.

Figure 1. The standard-deviation is only an estimation (order of magnitude) of the dispersion of data around the average value. Here the standard-deviations for 200 triplets of samples are shown. This large dispersion is notunusual, as the population from which the samples were selected at random had a standard-deviation equal to 1.

The dispersion is very large: whereas the real standard-deviation of the population was chosen to be 1, the experimental standard-deviations are distributed between 0 and 2. It can be proposed to students to take these values, and check statistical laws. Obviously, the determination of the experimental standard-deviations with more than 3 values is giving better results. For example, in Figure 2a and 2b, the same kind of results are shown for 6 and 15 samples respectively.

Why are bad practices sometimes used? In the introduction of this article, it was noted that it is not easy to find on the Internet any centralized guide of best practices for science, and this could be part of the answer. Or course, one can easily find documents explaining the good practices, such as the GUM or the Guide Eurachem, but such documents (50 pages for the GUM, 131 for the Guide CG4) are not easy to understand, being made for professionals. The application of a simple reality principle lead to proposing simplified documents for students at Bachelor level. Here only one particular activity of the scientific work was discussed, i.e. the propagation of variables, and only on one example, without theoretical discussion, but this can come at a later stage of scientific education. Both, examples and theoretical documents, could usefully make up a whole comprehensive group of data for best practices in science. Figure 2. Standard deviations calculated for repetitions of 6 (a) and 15(b) measurements b

References

ACS. 1980. ACS Guidelines for Data Acquisition and Data Quality Evaluation in Environmental Chemistry. Anal. Chem. 52, 2242-2249.

Bacon F. 1988. Novum Organum, P.F. Collier and son, New York,1902, Cité dans Bacon, inventer la science, Editions Belin, Paris, collection Un savant une époque.

Bédiat N. 2006. Méthode numérique de propagation des incertitudes de mesure (Méthode de Monte-Carlo), NTV 06/022, Note technique internet CETIAT.

Eurachem. 2012. Quantifying Uncertainty in Analytical Measurement. https://www . eurachem.org/index.php/publications/guides/quam#translations, dernier accès 1 juillet 2016. Harris D. C. 2007. Quantitative Chemical Analysis, W. H. Freeman and Company, New York, USA. 7e ed. HAS. 2016. http://www.has-sante.fr/portail/ jcms/c_1101438/fr/tableau-des-recommanda tions- debonne-pratique, dernier accès 16 mars 2016.

Inra Mission Qualité. 2016 a. Référentiel Qualité Inra, Mission Qualité 2003-2005, https://www6.bordeaux-aquitaine.inra.fr/ st_pee/.../referentiel-inra1.pdf, dernier accès 16 mars 2016. Inra. 2016 b. http://inra-dam-front-resources-cdn.brainsonic.com/ressources/afile/246617- c621bresource-charte-de-deontologie.html, dernier accès 16 mars 2016.

JCGM. 2015. www.bipm.org, dernier accès 6 avril 2015.

JCGM. 2008. Evaluation of measurement data. Supplement 1 to the "Guide to the expression of uncertainty in measurement". BIPM 101.

Piskounov N. 1980. Calcul différentiel et intégral Edition Mir, Moscou.

Rivier C. et Balère B. 2003. Guide méthodologique pour l'estimation des incertitudes en chimie analytique, Laboratoire national d'essais, LNE C370 X18, http://www.lne.fr/publications/metreau_guide_incertitudes.pdf, dernier accès 1 juillet 2016.

Robert C. et Casella G. 2010. Monte-Carlo Statistical Methods. Springer-Verlag, coll. « Springer Texts in Statistics ».

Saporta G. 2006. Probabilités, Analyse des données et Statistiques. Éditions Technip, Paris.

Suquet C. 2005. http://math.univ-lille1.fr/~suquet/Polys/TLC.pdf, dernier accès 16 mars 2016.

Taylor J. R. 1997. Error analysis: The study of uncertainties in physical measurements. Sausalito, CA: University Science Books (2nd ed.).

This H. 2015. Aidez les enfants ! Lettre de l'Académie d'agriculture de France, N°27, 15 février 2015, pp. 8-9.

This H. 2011. Cours de gastronomie moléculaire N°1 : Science, technologie, technique (culinaires) : Quelles relations ? Editions Quae/Belin, Paris.

This, H. 2013. Cours de gastronomie moléculaire N°2 : Les précisions culinaires. Editions Quae/Belin, Paris.

Verley J.-L. 1997. Espaces métriques, in Dictionnaire des mathématiques ; algèbre, analyse, géométrie. Albin Michel, Paris, p. 652-653.

Wiley, http://onlinelibrary.wiley.com/doi/10.1111/j.1742-1241.2006.01230.x/full, dernier accès 15 mars 2016.

Edité par : Dominique Job, CNRS, Membre de l'Académie d'agriculture de France.

Rapporteurs : Jean-Marc Boussard, Membre de l'Académie d'agriculture de France - Jérôme Vial, Maître de conférence ESPCI Paris

Rubrique : Cet article a été publié dans la rubrique « Enseignement » des Notes Académiques de l'Académie d'agriculture de France

Reçu: 8 février 2016 Accepté: 6 juillet 2016

Publié : 6 juillet 2016

Citation: This, H. 2016. La question des bonnes pratiques en sciences de la nature : comment exprimer des incertitudes de mesure, Notes Académiques de l'Académie d'agriculture de France / Academic Notes from the French Academy of Agriculture (N3AF), 4, 1-8.

Hervé This est physico-chimiste dans le Groupe INRA de gastronomie moléculaire, à AgroParisTech. Il est membre de l'Académie d'agriculture de France.