t-Distribution

The t-Distribution

Many engineers need to perform experiments that consist of measuring the value of an output response y of a normal process for a given setting of an input variable x. In such experiments, we know from experience that even if we can precisely set the input variable x to the same value, we'll probably get a different value for the output response y every time we measure it. This is because of the occurrence of measurement errors, a reality of engineering life that we all need to contend with.

If several measurements of the value of output y for a given setting of x are taken, we can compute for the mean Y and the variance s of these sample measurements. These sample mean and variance values, however, are different from the actual values of the mean µ and variance σ of the entire normal distribution of y values that can be taken given the set value of x. Thus, Y and s are simply sample estimates for the actual population mean µ and variance σ of the process output y, respectively.

In most real life problems, it is not always possible to know the actual values of µ and σ with absolute certainty, so as engineers we have nothing to work with except Y and s, which we can get from the finite number of measurements that we can do. The question now is, how close is Y to µ?

To answer this question, let us define a quantity, t, as follows: t = (Y - µ) / (s / sqrt(n)) where n is the number of random measurements taken. A population of t-values forms a t-distribution, which also looks like a normal distribution. It is, however, a bit wider than a normal distribution, given that it uses s to define its dispersion.

As the number of degrees of freedom (defined as n-1 if n is the number of independent measurements taken) of the t-distribution increases, it becomes more and more identical to the normal distribution. In fact, at a high enough number of degrees of freedom (say, > 100), the t-distribution becomes practically indistinguishable from the normal distribution.

The t-distribution can therefore be used to analyze normal distributions in cases where the true value of σ can not be obtained, such as when the sample size is limited. Increasing the number of measurements taken for y increases the degrees of freedom used to estimate σ, bringing the values of Y and s closer to µ and σ of the entire normal population, respectively.

One of the uses for the t-distribution is in determining the probability that the actual population mean µ of a group of measurements will fall between two values, provided that the experiment was performed in a randomized manner. This is done by calculating the sample population mean Y and variance s from measurements taken from an experiment with a number of runs. The following equation is then applied:

Values between µ which will fall at a given probability (1-a) = Y +/- [ρ(1-a, d.f.)] [s / sqrt(n)] with the value of ρ (1-a, d.f.) coming from the t-table (Table 1).

Note that Table 1 shows the a values instead of the probability values, so these have to be subtracted from 1 in order to obtain their corresponding probability levels, e.g., 95% probability level corresponds to the column where a = 0.05.

As an example, suppose that you obtained 5 independent measurements of the output of your process for the same set of inputs, from which you calculated a sample mean Y = 4.5 and a sample variance s = 0.1. If you're interested in the range of values between which there's a probability of 95% that the real population mean µ will lie, then (referring to the t-Table) you'd have to use a value of 2.776 for ρ (for d.f. = 4 and a = 0.05). Thus, given your data, there's 95% probability that µ will lie between 4.5 - [2.776 (0.1/2.236)] and 4.5 + [2.776 (0.1/2.236)], i.e., 4.376 < µ < 4.624.

LINKS: Normal Distribution; Cpk - ppm Table

HOME