MGCI Physics Department

MGCI PHYSICS DEPARTMENT Mr. H. M. van Bemmel

HANDLING OF EXPERIMENTAL DATA

1 Introduction

Aside from basic visual discoveries, scientists of any discipline typically have to handle experimental data using statistics to determine what information is therein contained. It is incumbent for the young scientist to learn the principles contained in this document to properly begin the process of learning this important aspect of science practise. The subject of statistics is broad and varied and if statistics are applied in an inappropriate manner, they can suggest improper conclusions. The usefulness of data is always linked to how precise it appears to be. For example, it took almost 20 years to statistically prove that cigarette smoking was a health risk. It was not until this data had been properly analyzed that governments could legally force cigarette makers to put health warnings on the packages. Every piece of equipment that can affect human safety or possess an economic risk will have statistics in its design. How probable is a failure? What are the consequences both safety and economic if this happens? No process is completely safe, so we accept a certain amount of risk in any undertaking. Processes that occur very often must have even greater safety standards. For example every time you board an airliner you face about a 1 in a million chance of losing your life in an accident however caused. This seems good odds, but if you fly 100 times in your life, you face a chance of getting hurt that approaches 1 in 10000. Many people fly this often, but many business people would fly 100 times more frequently. Their chance of getting hurt is still only 1 in a hundred. Yes, some people do lose their lives in a plane crash, but the risk is acceptable to most of us. As a young scientist, you need to learn how data is handled and how to begin to learn what information you can expect to determine from this data. The mathematics you will require is nothing more than grade 10, but you will have to understand what the numbers are telling us. This is done by 2 v. Bemmel consulting the theories that we have studied to explain the processes we are observing. The theory always gives us a clue as what statistical process would be most appropriate. In this document, you will be given some principles and some examples to allow to you understand how this can be done. Remember we are never satisfied with ONE observation and we NEVER guess at either the values or their precision. Data must be taken with an open but careful mind.

2 Collection of Measurements

When you make an observation of say the length of table, it is not possible to measure it with infinite precision. Even with the best equipment available, the most precise measured value in nature has only 14 significant digits. Using a metre stick or a ruler, you will not approach this level of precision to be sure, but how precise is a measurement of a table with a ruler?

2.1 Precision

Precision is the ability to repeat a given measurement. If we take a number of observations and they mostly agree we can say that the data is quite precise. This is NOT the same as accuracy (see below)

2.2 Accuracy or Error

Your error is the distance from the TRUE value to the measured value. However, we rarely know the true value because it is hidden from us. For example with the table I mentioned above, we cannot go to the “Big Book of Tables” and check to see what the value should be. We are left with our trusty ruler, the table and our ingenuity and no way of knowing what the true value should be. Furthermore, as we get more precise we have to consider the imperfections and other issues that will unquestionably be found on the table’s surface. If we pursue our precision quest far enough we will find that these imperfections will require us to make many millions of measurements to account for these bumps and dents. We then get down to say 10-12 m, which is simply 12 significant figures and we are now around atomic size. What then? With the Heisenberg uncertainly principle limiting our ability to determine the location of atomic particles, we will reach a point where even with the best equipment mankind can buy or imagine we will be left with a probability not a certainty for our measured value. The value that we most often get in independent experiments is called the accepted value not the “right” or “correct” one. Handling Uncertainty in Experiments 3

2.3 Reading Error

Regardless of the device you are using, there will be a limit to your ability read the scale or display. If the readout is digital then this is  1 LSD by convention. If the display is analog or a manual instrument with a scale, (such a ruler or calliper) you will ultimately have to estimate the final decimal place. For example if you are using a mm scaled ruler and asked to find the length of your textbook how precise can you state the value? You could easily see how many mm, but what about fractions of a mm? Can you REALLY see 0.1 of mm? I doubt it, but 1/5 of a mm has historically been reasonable. However, this is NOT your uncertainty for this value, because it has been chosen by you arbitrarily. That is NEVER good science. However, the value of the reading error can be very helpful in choosing the proper instrument for a measurement. Remember that a 10 times increase in precision typically is accompanied by a 100 times increase in cost. Those extra significant digits cost a lot of money! We often hear of the cost of medical equipment or tools in precision trades and are amazed as the cost of some of these things, but it is the precision that which they are expected to operate that forces the care in their manufacturing that requires more time and expertise and hence a much higher cost. Your reading error should accompany your data at the top of a table to indicate how precise you THOUGHT you could read the scale, but it should go no further!

2.4 Replication of Observations

Just like ourselves as people, we would like a second chance to impress. We would be disappointed if someone did not give us more than the briefest of consideration. We want our friends and family to get to know us and understand the richness of our personalities. This requires many “observations”. So it is with a measurement in the science world. If we are making the observations of the width of a table as before then by making only one observation we could be making a mistake. What we want are numerous observations that allow us to gain some confidence that the value we are most commonly getting is in fact probably pretty close to the accepted one. This replication of observations is useful to ensure that we are using the equipment properly. It is important to state that the ENTIRE observation should be repeated for each observation. For each observation, you should emplace your ruler along side the table in our example. It is NOT sufficient to simply read the ruler a number of times. If the ruler’s placing is not reset and there is a problem with how the ruler is being used this will not show in the repeated observations. 4 v. Bemmel

How many observations should we make? This depends on time and the situation. For data with an uncertainty of less than 1%, typically, 100 or more observations have to be made, but this is unreasonable in classroom situations. The theory of large numbers suggests that 30 observations tends to establish the trend very well, but this number is often too many as well for the amount of time available to a given science class. Certainly, a well-organized lab group should be able to effect at least ten observations of a given parameter. Everyone should be working at something.

2.5 Stating Uncertainties

It is fundamentally good manners in any field to communicate in the accepted manner. When quoting values with uncertainties it is important that you express them properly so that the reader will not be confused by the jumble of numbers and symbols.

2.5.1 Without Scientific Notation

No spaces Uncertainty of Mantissa (1.234.002) units

Mantissa No leading Relevant zero here Unit

The decimal point need not be after the first digit as long as the rules of uncertainty handling are followed (see below) (123.4.2) units

is also acceptable. Handling Uncertainty in Experiments 5

2.5.2 With Scientific Notation

Humans tend to have ranges of when they find a number is meaningful. For example, we tend to keep most numbers to three or four meaningful digits. We might buy a chocolate bar for $1.23, but if we buy a car we will say something like “It cost me 15 thousand bucks”. The actual price you paid might have $15,123.45, but who cares? You have communicated the essential aspects of the price. Saying a cost value with 7-figure detail would sound silly to anyone except your lawyer or bank manager. Therefore, by moving up to the thousands we are really using scientific notation in our communication. In science, we will use scientific notation when the number exceeds 4 digits to the left of the decimal or there is need for a leading zero to the right of a decimal with zeros to the left. We can also use a scientific notion where this is the accepted format for a given branch of science.

First for those who enjoy using computers and keeping everything on one line.

Everything the same for the region in the brackets (1.234.002) E 15 units

Uncertainty is expressed in Or for the more traditional, the same power of ten as the mantissa (1.234.002) x 10 15 units

I will discuss the two FAQ's that occur with this arrangement

1. Why the brackets?

A. They isolate the data and help the reader quickly see what belongs to what.

2. Why no leading zero on the uncertainty? 6 v. Bemmel

A. It is important to make it easy for your reader to understand what you are trying to say. You want to make it easy for then consider the following example taken from above

(1.234.002) units AND (1.234  0.002) units

For some reason the human eye sees the leading zero and adds equal “weight” to the uncertainty. This is not the intention for the uncertainty, while important, is NOT as important as the value. The method you have been asked to follow always makes the uncertainty shorter than the mantissa and subordinate to it, which is the aim. 2.6 Computation of Uncertainties

You will encounter various methods of computing uncertainties in your travels as a young scientist, but the methods that will be shown here are in wide acceptance or will require only minor modifications. If you make repeated measurements of the SAME parameter in the SAME manner with out ANY modifications then the collection of these values will form what is called a distribution around the mean or average. This occurs because the estimation of your last digit in your observation is close to being completely random. For example, in measuring your text book with a ruler, you are estimating that last digit. If this measurement is done properly the values you get will cluster around an average.

Consider the following data

2.54 2.56 2.54 2.53 2.55 2.54 2.54 2.53 2.52 2.54 2.53 2.54 2.55 2.56 2.54 2.55 2.57 2.55 2.54 2.53 2.55 2.54 2.54 2.54

2.55 2.53 2.52 2.53 2.54 2.55 2.56 2.55 2.54 2.54 2.55 2.55

It was collected using a metre stick and is the length of a table.

How precisely can we state the message this data contains about the length of this table? Since probabilities are inevitable, it falls to the subject of statistics to give a probabilistic sense to how we quote uncertainties. If we investigate the numbers above and count how many times, each of them occurs we might find an arrangement such as this Handling Uncertainty in Experiments 7

2.52 - 2 14

> 12 =

2.53 - 6 s

e 10 u l 2.54 - 13 a 8 V

2.55 - 9 r e

2.56 -- 3 u 2 N 2.57 - 1 0 2.52 2.53 2.54 2.55 2.56 2.57 Lengt h (m) => The most common value is probably somewhere between 2.54 and 2.55. We can estimate it with an average, but how many digits of the average should we keep? Is it possible to have more precision than we state in our data, which is typically based on the reading error (section 2.3)? This depends on the data. If the data has a large spread to it then the precision is less since the measure is less repeatable. We must also be careful about mindlessly employing averages. If the data were to be skewed an average is not appropriate and we should probably revisit how we are collecting this data. Some processes are ‘exponential’ and so the data on one side can be marked further away than the data on the other side. This situation requires an approach beyond this course. In most cases the exponential, if we were to deal with one, will be sufficiently gentle for us to use an average. The measure of the spread of a data set that has been collected randomly is called the standard deviation. There is a lot more to this concept than what I state above, but for our purposes, we can consider this a measure of the spread of the data set. The standard deviation () is defined by the following equation,

n x  x 2  k [1]   k 1 n 1

Where:

 = The standard deviation x = The arithmetic mean or average n = the number of values th xk = The k value of x

Fortunately in our technological world, computational equipment will perform such a calculation. You DO NOT have to remember this equation, but you do have 8 v. Bemmel to know how compute this value on your calculator and in the computer from a column of numbers. If you wish to program a computer to make this computation there is another form of the equation that is easier to use to effect this.

2 n 2   n x k   x    k  [2]   k 1   x n(n 1)

2.6.1 Computation of Standard Deviation () with a Calculator

There are so many calculators on the market that it is essential that you consult the operations manual. However, even then many students find this a problem. Usually you have to put your calculator in Statistics Mode Make sure that the memory for statistics has been emptied or you will be averaging your data with the previous entries. The best way to check is to compute the average before entering the data. If this gives you an error message then the memory is probably cleared. Enter your data. This usually takes the form of typing in the numbers and pressing a “Data” key that is often the M+ key, but each calculator type is different. Usually the calculator’s screen will indicate how many numbers you have entered. This is helpful when checking that you have not missed a number or entered it twice. Now press the appropriate key to give you first your mean value. The symbol is usually ˙x˙˙ . For the standard deviation, we can have up to 3 choices. The standard deviation of the sample (sx) would be our first choice, but his is not offered on every calculator. Failing this, we might be able to find a key that has (n-1) and finally x. These deviations progressively smaller in value and have involved definitions that are not required for our use here. For the TI-8/4 you are to create a list variable and enter the data. After this has been saved you can invoke the command for the standard deviation.

2.6.1.1. sn, n or n-1, which do I use?

The subject of statistics is large and related to many other situations. Your calculator or computer may have alternative methods that can be used to compute standard deviation type computations. The one we seek is n-1. Not all calculators or computers will have this process. If not then we will retreat to n and finally to sn. No marks will be deducted if your calculator or computer does not agree. Handling Uncertainty in Experiments 9

n = “biased” version assumes that the list is the population. If you have a small set of numbers then it may not be representative of the actual distribution. By dividing by the total number of observations this can produce a standard deviation that is excessively low. It is considered biased against small sample sizes. Some calculators use sn n n-1 = “unbiased” or standard deviation of a sample of a population. This version is unbiased because by choosing a divisor of n-1 we are conservative for small sample sizes and as n  n-1 effectively does as well making the

n = n-1 for large values of n. Some calculators use sn for n and then n for

n-1. In the final analysis you choose the deviation that is the largest

2.6.2 Standard Deviations Using Excel

Spreadsheet programs a wonderful addition to the computational arsenal. Unlike calculators, they can present many variables and are much easier to use than writing a computer program. Enter your data in a column (or row). In the cell that you would like the average to appear type “=AVERAGE(Range of Cells)”. Leave out the quotes and enter the range of cells that include the data you entered; it really is that simple! The situation is the same of the standard deviation. The command is “=STDEV(Range of Cells)”. Using the computer to determine these values is the easiest method, but calculator can help you when the computer is not there. Excel computes the n-1 option.

2.6.3 Signal to Noise Ratio (S/N)

In a given measurement if we take the ratio of the signal (average or mantissa) to the noise (deviation or uncertainty) we obtain a value called the ‘signal to noise’ ratio. The higher the S/N the more confidence we have in the data as the measurement uncertainty contributes much less in toward the total value. In the opposite case of S/N values are near 1 then the uncertainty is about the same size as the mantissa, making the measurement highly subjective. Acceptable values of S/N are usually in excess of 100, depending on the required precision. However, in class you values will often be lower due to the fewer number of observation made. On a final note one wants to make sure that the data they collect has the highest possible S/N given the conditions. One also wants to avoid working with measurements with low S/N in computations for they will contaminate the entire computation. 10 v. Bemmel

The data processing maxim

“If at all possible, do nothing in your analysis that introduces a lower S/N than your original measurements”

3 The Rules

When I first did this kind of instruction, students had real trouble sorting out what was required and what was extra information. I have devised 3 rules that MUST be followed TO THE LETTER to properly express the data you collect from an experiment.

3.1 Rule 0

All MEASURED values must have an uncertainty. This uncertainty will be ONE standard deviation from the arithmetic mean of a distribution of observations

Corollary 1

Exception: Small countable groups can be exact

Most people if asked to count the number of people in the classroom could do this without getting it wrong. As long as the group of people is small and they are willing to be counted in a systematic manner this will work. This is known as a small countable group. The term small is relative, but you must be prepared to defend what you mean by “small” in these situations and convince your supervisor (teacher) that you can indeed determine without error the number of items in this situation. Remember if you are measuring something it will never fit into this category. This only works if you are counting a small umber of discrete objects.

Corollary 2

Exception: Sourced values are assessed an uncertainty of  5 LSD’s

Periodically in scientific work, values that are of use in a experiment must be sourced from a reliable reference. Preferably, students would select a scholarly Handling Uncertainty in Experiments 11 reference, which should provide uncertainty estimates with every value. If this is impossible then we are forced to guess the uncertainty. This guess is deliberately conservative. If we consider a resource and we find that, the mass of Earth is given as 5.979 E24 kg. We would like to make a reasonable estimate of the uncertainty here. Since there are four digits, the rightmost digit, known as the least significant digit (LSD) was measured, but our confidence in this number is not very high because there is no digit to its right. The convention is to estimate 5 least significant digits as the uncertainty. Here the least significant digit is .001 E 24 so our uncertainty is 0.005 E24 and the final value is given as (5.979.005) E24 kg.

3.2 Rule 1

There will be only ONE significant figure retained in ALL uncertainties however they are computed or sourced.

3.3 Rule 2

After the uncertainty has been found, the mantissa value will be rounded to the same number of DECIMAL places as exist in the uncertainty after round as per Rule 1.

3.4 Table Example

From the table of data in section 2.6 we can easily compute  in the computer and it was found to be  = 0.011052 and the average x = 2.5425. To round this following the above principles we invoke rule 1 and set x = 0.01 and then we round the mean to x = 2.54. Stating the final value in the accepted manner gives x = (2.54.01) m. It really is that SIMPLE. Notice there is no ‘bar’ above the x when it is expressed this way.

4 Computations with Uncertainties

This section gives the details of how computations that include uncertainties are to be handled. Your answer should always be a number and its uncertainty. This uncertainty must be properly computed as well. Spreadsheets such as Excel are very useful for doing this kind of work. In the following examples x, y, or z, etc are the variables and x, y and z are the uncertainties. The overriding principle is that the uncertainty computation and rounding MUST occur after EACH computation NOT just at the end. 12 v. Bemmel

4.1. Addition or Subtraction

We want z  z for our final answer.

n Let z = x1 + x2 + … + xn or z =  xk k 1

n 2 2 2 2 Then  z  1   2  ...   n or  z   k k 1

If z = x1 - x2 - … - xn or some combination of both addition and subtraction there is NO CHANGE to the computation of z above. To be clear the uncertainties are still ADDED in quadrature regardless of the presence of negative signs in the equation.

Caution. Subtracting two variables that are very close in value will result in a very imprecise number as the uncertainty will still rise, but the value will become very small. When you effect your analysis check to see how this can be done to avoid such situations.

4.2. Multiplication and Division

If the computation is a simple multiplication without exponents then the following simplification of the more general rule given in 5.5.5.3 below may be useful.

We want z  z for our final answer.

Let z = (x1) (x2) … (xn) (They are all multiplied together)

2 2 2 1  2  n Then  z  z 2  2  ...  x1 x2 xn

4.3 Assessing Uncertainty in Computations with Functions such as Trig Handling Uncertainty in Experiments 13

NOTE: This section contains advanced forms of Calculus. Grade 11’s are not responsible for doing this, but if you have an equation of this sort, you should ask your teacher to work out the proper uncertainty equation for you.

In the past when you have combined functions that have exponents or trig in them you may have simply estimated a percentage error. This will not suffice any longer. The use of percent implies that the uncertainties grow linearly with the value. This may not be so. It often is not during division and exponentiation. The general method of computing the uncertainty of more complicated expressions is to use the general method of quadratures as shown below with an example.

Suppose w = f (x, y, z) (i.e. f is a function of x, y and z), then

2 2 2  f   f   f  2   2 2  w     x     y     z  x   y   z 

I realize at this introductory level this appears a little daunting, but it is NOT that hard. Consider the following example.

w = x sin y + xyz

Clearly using the rule in part 2 above will not suffice. We must use part 3. The uncertainties like z etc are just the uncertainties that are attached to each variable so they are simply numbers. The new part (nothing is hard, just new!) is the partial derivatives.

Since w is a function of a number of variables a derivative of w such as dw/dx or w’ is not possible since its real change of value compared to all variables is its derivative. This means that for multivariate functions their derivatives are matrices, not a single equation. To take a partial derivative is not that difficult . You realize that as something moves in y it cannot change its x or z values. Therefore, we are measuring how a variable changes with respect to one axis. Thus, we are only taking part of the derivative so it is called a partial derivative. I do not intend to show how these derivatives are effected here. This can learned from your Calculus text or your teacher.

Now to the example,

w = x sin y + x2yz sin x 14 v. Bemmel

w Then  sin y  yz 2xsin x  x2 cos x x

w and  x  x2 zsin x y

and for the last variable z we have

w  0  x2 ysin x z

So the final uncertainty equation computing w is the following

2 2 2 2 2 2 2 2 2  w  sin y  yz2xsin x  x cos x  x  x  x zsin x  y  x ysin x  z

Gratefully, we can enter this into a spreadsheet or computer to obtain the answer. Remember if you are dealing with the same function, the derivatives do not change only their value changes.

Good Luck!