The
tDistribution
Many
engineers need to perform experiments that consist of measuring the
value of an output response y of a normal process for a given setting of
an input variable x. In such experiments, we know from
experience that even if we can precisely set the input variable x to the
same value, we'll probably get a different value for the output response
y every time we measure it. This is because of the occurrence
of
measurement
errors,
a reality of engineering life that we all need to contend with.
If
several measurements of the value of output y for a given setting of x
are taken, we can compute for the mean
Y
and the variance
s
of these
sample measurements. These sample mean and variance values, however, are
different from the actual values of the mean
µ
and variance
σ
of the
entire
normal distribution of y values that can be taken given the set value of
x. Thus,
Y
and
s are
simply sample estimates for the actual population mean
µ
and variance
σ
of the process output y, respectively.
In most real life
problems, it is not always possible to know the actual values of µ
and
σ
with absolute certainty, so as engineers we have nothing to work with
except
Y
and
s,
which we can get from the finite number of measurements that we can do.
The question now is, how close is
Y
to
µ?
To answer this question, let
us define a quantity, t, as follows:
t = (Y  µ) / (s / sqrt(n))
where n
is the number of
random measurements taken.
A population of tvalues forms a tdistribution, which also looks like a
normal distribution. It is, however, a bit wider than a normal
distribution, given that it uses s to define its dispersion.
As the number of
degrees of freedom
(defined as
n1
if n is the number of independent measurements taken)
of the tdistribution increases, it becomes more and more identical to
the normal distribution. In fact, at a high enough number of
degrees of freedom (say, > 100), the tdistribution becomes practically
indistinguishable from the normal distribution.
The tdistribution can
therefore be used to analyze normal distributions in cases where the
true value of σ can
not
be obtained, such as when the sample size is limited. Increasing the number of measurements
taken for y increases the degrees of freedom used to estimate σ, bringing the values
of Y and s closer to µ and σ of the entire normal population,
respectively.
One of the
uses for the tdistribution is in determining the probability that the
actual population mean
µ
of a group of measurements will fall between two values, provided
that the experiment was performed in a
randomized
manner. This is done by calculating the sample population mean Y
and variance s from measurements taken from an experiment with a number
of runs. The following equation is then applied:
Values
between
µ
which will
fall at a given probability
(1a)
= Y +/ [ρ(1a,
d.f.)] [s / sqrt(n)] with the value of ρ (1a,
d.f.) coming from the
ttable (Table 1).
Note that
Table 1 shows the
a
values instead of the probability values, so these have to be subtracted
from 1 in order to obtain their corresponding probability levels, e.g.,
95% probability level corresponds to the column where
a
= 0.05.
As an
example, suppose that you obtained 5
independent
measurements
of the output of your process for the same set of inputs, from which you
calculated a sample mean Y = 4.5 and a sample variance s = 0.1. If
you're interested in the range of values between which there's a
probability of 95% that the real population mean µ will lie, then
(referring to the tTable) you'd
have to use a value of 2.776
for ρ (for d.f. = 4 and
a
= 0.05). Thus, given
your data, there's 95% probability that µ
will lie between 4.5  [2.776 (0.1/2.236)] and 4.5 + [2.776 (0.1/2.236)],
i.e., 4.376 < µ < 4.624.
<See Also: tTable>
LINKS:
Normal
Distribution; Cpk  ppm
Table
HOME
Copyright
© 2005
SiliconFarEast.com.
All Rights Reserved.
