STATISTICAL ANALYSIS OF EXPERIMENTAL DATA
Gary L. Bertrand
University of Missouri-Rolla

BACKGROUND

The statistical approach recognizes that it is impossible to precisely state how accurate an observation might be, or even how accurate an observed average or mean might be.  A set of observations, which might range from a few measurements to an extremely large number of measurements, is represented by a single representative value and an indication of the reproducibility (uncertainty) of the observations.  The value chosen as the representative value is the average or arithmetic mean.  The indication of reproducibility is based on a quantity called the standard deviation (s).  This quantity is related to the probability that a single observation will fall within a specified range about the mean.  The equations for calculating these probabilities are based on the assumption of an extremely large number of observations, then they are adapted to smaller numbers of observations (as is the normal experimental situation).  The adaptation to the smaller number of observations is based on the degrees of freedom (df).  In the case of an arithmetic mean, the degrees of freedom is simply the number of observations (N) minus 1 (df = N - 1).

The square of the standard deviation (s2) is defined as the sum of the squares of the deviations from the mean divided by the degrees of freedom.  The square of the standard deviation of the mean (smean2) is defined as the square of the standard deviation divided by the number of observations ( smean2 = s2/N ).  These quantities are more easily understood by performing calculations on a set of observations:

We consider a set of 10 observations:  106,111,108,105,109,115,110,112,114,101 .

First we sum the values, then divide by the number of observations to calculate the mean.

 sum: observations 106 111 108 105 109 115 110 112 114 101 ________ 1091 dev. of obs. from mean 106 - 109.1 = - 3.1 111 - 109.1 =   1.9 108 - 109.1 = - 1.1 105 - 109.1 = - 4.1 109 - 109.1 = - 0.1 115 - 109.1 =   5.9 110 - 109.1 =   0.9 112 - 109.1 =   2.9 114 - 109.1 =   4.9 101 - 109.1 = - 8.1 ________________                        0.0 (dev. of obs. from mean)2   9.61   3.61   1.21 16.81   0.01 34.81   0.81   8.41 24.01 65.61 ______ 164.90

number of observations:  N = 10     degrees of freedom: df = N - 1 = 9;

mean = sum/N = 1091/10 = 109.1

Then we subtract the mean from each of the observed values.
These values are squared and summed to calculate the standard deviation and the standard deviation of the mean.

s2 = (sum of squares)/df = 164.9/9 = 18.3

s  = 4.3

smean2 = s2/N = 18.3/10 = 1.83

smean =  1.4

We have carried extra significant figures through these calculations.  In practice, there is little justification for stating an uncertainty to more than one significant figure.

These statistical calculations assume that a large number of observations have been used to calculate the mean and the standard deviation.  When smaller numbers of observations are used, there is less confidence that the calculated values are really representative of the statistical probabilities.  An additional factor is introduced  -  the Student t-factor (W. S. Gosset published his work on this factor under the pseudonym "Student").  The standard deviation of the mean is multiplied by the t-factor, which is based on the degrees of freedom and the desired confidence level to obtain the Confidence Interval (d) at the specified percentage (the level of confidence).  An abbreviated table of t-factors is given below. For the set of data above; Mean = 109.1 , smean =  1.4 , df = 9:
d50% =  1.4 x 0.703 = 1.0   ,  d90% =  1.4 x 1.83 = 2.6   ,  d95% =  1.4 x 2.26 = 3.1  .

These quantities are stated as "the uncertainty is ± 1 at the 50% level of confidence" or "the mean value is 109 ± 3 at the 95% level of confidence".  This is interpreted to mean that if you repeated the set of measurements a large number of times, half of those means would fall in the range 108 - 110, and 95% of them would fall in the range 106 - 112.

COMBINING UNCERTAINTIES IN CALCULATIONS - PROPAGATION OF UNCERTAINTY

Scientific measurements are often combined to calculate other quantities that may be used in subsequent calculations.  Each of these quantities will have an uncertainty, and the calculated value must be assigned an uncertainty.  Suppose quantities W, X, and Y are measured with uncertainties dW, dX, and dY.  These will be combined in some equation to calculate some other quantity Z:

Z = Z(W,X,Y) .

There are rigorous rules for combining the uncertainties to calculate the uncertainty in Z (dZ).  Fortunately, most of the situations that you will encounter may be reduced to two simple relationships.

For Multiplication and Division:  Z = WX/Y
(dZ/Z)2 = (dW/W)2 +(dX/X)2 + (dY/Y)2

For Addition and Subtraction:  Z = W - X + Y
dZ2 = dW2 + dX2 +dY2

Those rules also cover the following cases, but we will include them here for convenience:

If there are powers involved in the equation using multiplication and/or division:

Z = W2X3/Y1/2 ;     (dZ/Z)2 = 4(dW/W)2 + 9(dX/X)2 + (1/4)(dY/Y)2 .

If there is multiplication by scalars in the equation using addition and/or subtraction:

Z = 2W - 3X + Y/2   ;  dZ2 = 4dW2 + 9dX2 + (1/4)dY2 .

In applying these rules, it is important to have the uncertainties of all of the independent variables (W, X, and Y) at the same level of confidence.  If the uncertainties are at the 50% level, the uncertainty in Z will be at the 50% level.

EXAMPLE

Lets assume that you have made several measurements of the width and height of a rectangle, and have calculated the mean values, the standard deviations, the standard deviations of the means, and the uncertainties at the 95% level of confidence.

Values:  W = 102   dW,95% = 3                     H =  113   dH,95% = 4

The area is calculated:  Area = W x H = 102 x 113 = 11526.

For the multiplication process:  (darea/Area)2 = (dW/W)2 + (dH/H)2

Therefore:  (darea/11526)2 = (3/102)2 + (4/113)2 = .00087 + .00125 = .00212 ,

and:  darea= 11526 x (.00212)1/2 = 11526 x .046 = 530 .

The area may be reported as:  Area = 11,500 ± 500 at the 95% level of confidence .

The perimeter is calculated:  Perimeter = 2W + 2H = 2(102) + 2(113) = 430 .

For the addition process (with scalars): dperimeter2 = 4 dW2 + 4 dH2 + = 4(3)2 + 4(4)2 = 100 ,

and:   dperimeter= 10.

The perimeter may be reported as: Perimeter = 430 ± 10 at the 95% level of confidence.

close