The t-Distribution


Many engineers need to perform experiments that consist of measuring the value of an output response y of a normal process for a given setting of an input variable x.  In such experiments,  we know from experience that even if we can precisely set the input variable x to the same value, we'll probably get a different value for the output response y every time we measure it.  This is because of the occurrence  of measurement errors, a reality of engineering life that we all need to contend with.


If several measurements of the value of output y for a given setting of x are taken, we can compute for the mean Y and the variance s of these sample measurements. These sample mean and variance values, however, are different from the actual values of the mean and variance σ of the entire normal distribution of y values that can be taken given the set value of x.  Thus, Y and s are simply sample estimates for the actual population mean  and variance σ of the process output y, respectively. 


In most real life problems, it is not always possible to know the actual values of   and σ with absolute certainty, so as engineers we have nothing to work with except Y and s, which we can get from the finite number of measurements that we can do. The question now is, how close is  Y to ?


To answer this question, let us define a quantity, t, as follows: t = (Y - ) / (s / sqrt(n)) where n is the number of random measurements taken. A population of t-values forms a t-distribution, which also looks like a normal distribution.  It is, however, a bit wider than a normal distribution, given that it uses s to define its dispersion.


As the number of degrees of freedom (defined as n-1 if n is the number of independent measurements taken)  of the t-distribution increases, it becomes more and more identical to the normal distribution.  In fact, at a high enough number of degrees of freedom (say, > 100), the t-distribution becomes practically indistinguishable from the normal distribution.


The t-distribution can therefore be used to analyze normal distributions in cases where the true value of σ can not be obtained, such as when the sample size is limited. Increasing the number of measurements taken for y increases the degrees of freedom used to estimate σ,  bringing the values of Y and s closer to and  σ of the entire normal population, respectively.


One of the uses for the t-distribution is in determining the probability that the actual population mean of a group of measurements  will fall between two values, provided that the experiment was performed in a randomized manner.  This is done by calculating the sample population mean Y and variance s from measurements taken from an experiment with a number of runs.  The following equation is then applied:


Values between which will fall at a given probability (1-a) = Y +/- [ρ(1-a, d.f.)] [s / sqrt(n)]  with the value of ρ (1-a, d.f.) coming from the t-table (Table 1).


Note that Table 1 shows the a values instead of the probability values, so these have to be subtracted from 1 in order to obtain their corresponding probability levels, e.g., 95% probability level corresponds to the column where a  = 0.05.


As an example, suppose that you obtained 5 independent measurements of the output of your process for the same set of inputs, from which you calculated a sample mean Y = 4.5 and a sample variance s = 0.1.  If you're interested in the range of values between which there's a probability of 95% that the real population mean will lie, then (referring to the t-Table) you'd have to use a value of 2.776 for ρ (for d.f. = 4 and  a = 0.05).  Thus, given your data, there's 95% probability that will lie between 4.5 - [2.776 (0.1/2.236)] and 4.5 + [2.776 (0.1/2.236)], i.e., 4.376 < < 4.624.


<See Also: t-Table>


LINKS:  Normal DistributionCpk - ppm Table



Copyright 2005 All Rights Reserved.