MGCI Physics Department

Handling Uncertainty in Experiments 7

MGCI Physics Department Mr. H. M. van Bemmel

Handling of Experimental Data

1 Introduction

Aside from basic visual discoveries, scientists of any discipline typically have to handle experimental data using statistics to determine what information is therein contained. It is incumbent for the young scientist to learn the principles contained in this document to properly begin the process of learning this important aspect of science practise.

The subject of statistics is broad and varied and if statistics are applied in an inappropriate manner, they can suggest improper conclusions. The usefulness of data is always linked to how precise it appears to be. For example, it took almost 20 years to statistically prove that cigarette smoking was a health risk. It was not until this data had been properly analyzed that governments could legally force cigarette makers to put health warnings on the packages. Every piece of equipment that can affect human safety or possess an economic risk will have statistics in its design.

How probable is a failure? What are the consequences both safety and economic if this happens? No process is completely safe, so we accept a certain amount of risk in any undertaking. Processes that occur very often must have even greater safety standards. For example every time you board an airliner you face about a 1 in a million chance of losing your life in an accident however caused. This seems good odds, but if you fly 100 times in your life, you face a chance of getting hurt that approaches 1 in 10000. Many people fly this often, but many business people would fly 100 times more frequently. Their chance of getting hurt is still only 1 in a hundred. Yes, some people do lose their lives in a plane crash, but the risk is acceptable to most of us.

As a young scientist, you need to learn how data is handled and how to begin to learn what information you can expect to determine from this data. The mathematics you will require is nothing more than grade 10, but you will have to understand what the numbers are telling us. This is done by consulting the theories that we have studied to explain the processes we are observing. The theory always gives us a clue as what statistical process would be most appropriate. In this document, you will be given some principles and some examples to allow to you understand how this can be done. Remember we are never satisfied with ONE observation and we NEVER guess at either the values or their precision. Data must be taken with an open but careful mind.

2 Collection of Measurements

When you make an observation of say the length of table, it is not possible to measure it with infinite precision. Even with the best equipment available, the most precise measured value in nature has only 14 significant digits. Using a metre stick or a ruler, you will not approach this level of precision to be sure, but how precise is a measurement of a table with a ruler?

2.1 Precision

Precision is the ability to repeat a given measurement. If we take a number of observations and they mostly agree we can say that the data is quite precise. This is NOT the same as accuracy (see below)

2.2 Accuracy or Error

Your error is the distance from the TRUE value to the measured value. However, we rarely know the true value because it is hidden from us. For example with the table I mentioned above, we cannot go to the “Big Book of Tables” and check to see what the value should be. We are left with our trusty ruler, the table and our ingenuity and no way of knowing what the true value should be. Furthermore, as we get more precise we have to consider the imperfections and other issues that will unquestionably be found on the table’s surface. If we pursue our precision quest far enough we will find that these imperfections will require us to make many millions of measurements to account for these bumps and dents. We then get down to say 10-12 m, which is simply 12 significant figures and we are now around atomic size. What then? With the Heisenberg uncertainly principle limiting our ability to determine the location of atomic particles, we will reach a point where even with the best equipment mankind can buy or imagine we will be left with a probability not a certainty for our measured value.

The value that we most often get in independent experiments is called the accepted value not the “right” or “correct” one.

2.3 Reading Error

Regardless of the device you are using, there will be a limit to your ability read the scale or display. If the readout is digital then this is ± 1 LSD by convention. If the display is analog or a manual instrument with a scale, (such a ruler or calliper) you will ultimately have to estimate the final decimal place. For example if you are using a mm scaled ruler and asked to find the length of your textbook how precise can you state the value? You could easily see how many mm, but what about fractions of a mm? Can you REALLY see 0.1 of mm? I doubt it, but 1/5 of a mm has historically been reasonable. However, this is NOT your uncertainty for this value, because it has been chosen by you arbitrarily. That is NEVER good science. However, the value of the reading error can be very helpful in choosing the proper instrument for a measurement. Remember that a 10 times increase in precision typically is accompanied by a 100 times increase in cost. Those extra significant digits cost a lot of money! We often hear of the cost of medical equipment or tools in precision trades and are amazed as the cost of some of these things, but it is the precision that which they are expected to operate that forces the care in their manufacturing that requires more time and expertise and hence a much higher cost. Your reading error should accompany your data at the top of a table to indicate how precise you THOUGHT you could read the scale, but it should go no further!

2.4 Replication of Observations

Just like ourselves as people, we would like a second chance to impress. We would be disappointed if someone did not give us more than the briefest of consideration. We want our friends and family to get to know us and understand the richness of our personalities. This requires many “observations”. So it is with a measurement in the science world. If we are making the observations of the width of a table as before then by making only one observation we could be making a mistake. What we want are numerous observations that allow us to gain some confidence that the value we are most commonly getting is in fact probably pretty close to the accepted one. This replication of observations is useful to ensure that we are using the equipment properly.

It is important to state that the ENTIRE observation should be repeated for each observation. For each observation, you should emplace your ruler along side the table in our example. It is NOT sufficient to simply read the ruler a number of times. If the ruler’s placing is not reset and there is a problem with how the ruler is being used this will not show in the repeated observations.

How many observations should we make? This depends on time and the situation. For data with an uncertainty of less than 1%, typically, 100 or more observations have to be made, but this is unreasonable in classroom situations. The theory of large numbers suggests that 30 observations tends to establish the trend very well, but this number is often too many as well for the amount of time available to a given science class. Certainly, a well-organized lab group should be able to effect at least ten observations of a given parameter. Everyone should be working at something.

2.5 Stating Uncertainties

It is fundamentally good manners in any field to communicate in the accepted manner. When quoting values with uncertainties it is important that you express them properly so that the reader will not be confused by the jumble of numbers and symbols.

2.5.1 Without Scientific Notation

(1.234±.002) units

The decimal point need not be after the first digit as long as the rules of uncertainty handling are followed (see below)

(123.4±.2) units

is also acceptable.

2.5.2 With Scientific Notation

Humans tend to have ranges of when they find a number is meaningful. For example, we tend to keep most numbers to three or four meaningful digits. We might buy a chocolate bar for $1.23, but if we buy a car we will say something like “It cost me 15 thousand bucks”. The actual price you paid might have $15,123.45, but who cares? You have communicated the essential aspects of the price. Saying a cost value with 7-figure detail would sound silly to anyone except your lawyer or bank manager. Therefore, by moving up to the thousands we are really using scientific notation in our communication.

In science, we will use scientific notation when the number exceeds 4 digits to the left of the decimal or there is need for a leading zero to the right of a decimal with zeros to the left. We can also use a scientific notion where this is the accepted format for a given branch of science.

First for those who enjoy using computers and keeping everything on one line.

(1.234±.002) E 15 units

Or for the more traditional,

(1.234±.002) x 10 15 units

I will discuss the two FAQ's that occur with this arrangement

1. Why the brackets?

A. They isolate the data and help the reader quickly see what belongs to what.

2. Why no leading zero on the uncertainty?

A. It is important to make it easy for your reader to understand what you are trying to say. You want to make it easy for then consider the following example taken from above

(1.234±.002) units AND (1.234 ± 0.002) units

For some reason the human eye sees the leading zero and adds equal “weight” to the uncertainty. This is not the intention for the uncertainty, while important, is NOT as important as the value. The method you have been asked to follow always makes the uncertainty shorter than the mantissa and subordinate to it, which is the aim.

2.6 Computation of Uncertainties

You will encounter various methods of computing uncertainties in your travels as a young scientist, but the methods that will be shown here are in wide acceptance or will require only minor modifications.

If you make repeated measurements of the SAME parameter in the SAME manner with out ANY modifications then the collection of these values will form what is called a distribution around the mean or average. This occurs because the estimation of your last digit in your observation is close to being completely random. For example, in measuring your text book with a ruler, you are estimating that last digit. If this measurement is done properly the values you get will cluster around an average.

Consider the following data

2.54 2.56 2.54 2.53 2.55 2.54 2.54 2.53 2.52 2.54 2.53 2.54

2.55 2.56 2.54 2.55 2.57 2.55 2.54 2.53 2.55 2.54 2.54 2.54

2.55 2.53 2.52 2.53 2.54 2.55 2.56 2.55 2.54 2.54 2.55 2.55

It was collected using a metre stick and is the length of a table.

How precisely can we state the message this data contains about the length of this table? Since probabilities are inevitable, it falls to the subject of statistics to give a probabilistic sense to how we quote uncertainties. If we investigate the numbers above and count how many times, each of them occurs we might find an arrangement such as this

2.52 - 2

2.53 - 6

2.54 - 13

2.55 - 9

2.56 -- 3

2.57 - 1

The most common value is probably somewhere between 2.54 and 2.55. We can estimate it with an average, but how many digits of the average should we keep? Is it possible to have more precision than we state in our data, which is typically based on the reading error (section 2.3)? This depends on the data. If the data has a large spread to it then the precision is less since the measure is less repeatable.

We must also be careful about mindlessly employing averages. If the data were to be skewed an average is not appropriate and we should probably revisit how we are collecting this data. Some processes are ‘exponential’ and so the data on one side can be marked further away than the data on the other side. This situation requires an approach beyond this course. In most cases the exponential, if we were to deal with one, will be sufficiently gentle for us to use an average.

The measure of the spread of a data set that has been collected randomly is called the standard deviation. There is a lot more to this concept than what I state above, but for our purposes, we can consider this a measure of the spread of the data set. The standard deviation (s) is defined by the following equation,

[1]

Where:

s = The standard deviation

= The arithmetic mean or average

n = the number of values

xk = The kth value of x

Fortunately in our technological world, computational equipment will perform such a calculation. You DO NOT have to remember this equation, but you do have to know how compute this value on your calculator and in the computer from a column of numbers. If you wish to program a computer to make this computation there is another form of the equation that is easier to use to effect this.