# Inferences about One Population Standard Deviation

Copyright © 2007–2017 by Stan Brown

Copyright © 2007–2017 by Stan Brown

**Summary:**
In class we learn to estimate population means and test
hypotheses about them. It can also be important to estimate or test
**variability** — standard deviation or variance of a
population.
This page shows you how. All operations can be done
in the accompanying
Excel workbook
(43 KB), or in
the downloadable TI-83/84 program MATH200B part 5.

The tests on standard deviation or variance of a population require that
**the underlying population must be normal**.
They are not robust, meaning that even moderate departures from
normality can invalidate your analysis.
See MATH200A Program part 4
for procedures to test whether a
population is normal by testing the sample.

**Outliers are also unacceptable** and must be ruled out.
See MATH200A Program part 2
for an easy way to test for outliers.

You already know how to test the mean of a population with a t test, or estimate a population mean using a t interval. Why would you want to do that for the standard deviation of a population?

The standard deviation measures **variability**. In many
situations not just the average is important, but
also the variability. Another way to look at it is that
**consistency** is important: the variability must not be too
great.

For example, suppose you are thinking about investing in one of two mutual funds. Both show an average annual growth of 3.8% in the past 20 years, but one has a standard deviation of 8.6% and the other has a standard deviation of 1.2%. Obviously you prefer the second one, because with the first one there’s quite a good chance that you’d have to take a loss if you need money suddenly.

Industrial processes, too, are monitored not only for average output but for variability within a specified tolerance. If the diameter of ball bearings produced varies too much, many of them won’t fit in their intended application. On the other hand, it costs more money to reduce variability, so you may want to make sure that the variability is not too low either.

Write your hypotheses in the usual way. For H_{0}, you
compare (=, ≤, or ≥) the population standard deviation
σ to the claimed value σ_{o}. H_{1} contains ≠,
>, or < as usual.

The test statistic is

**χ² = (n−1) s² / σ _{o}²**

You perform a one-tailed test by computing the cumulative probability from 0 to the χ² (left tail) or from χ² to 10^99 or ∞ (right tail). For a two-tailed test, compute the cumulative probability and double it.

**Example 1**: A machine packs cereal into
boxes, and you don’t want too much variation from box to box.
You decide that a standard deviation of no more than five
grams (about 1/6 ounce) is acceptable. To determine whether the machine is
operating within specification, you randomly select 45 boxes. Here are
the weights of the boxes, in grams:

386 | 388 | 381 | 395 | 392 | 383 | 389 | 383 | 370 |

379 | 382 | 388 | 390 | 386 | 393 | 374 | 381 | 386 |

391 | 384 | 390 | 374 | 386 | 393 | 384 | 381 | 386 |

386 | 374 | 393 | 385 | 388 | 384 | 385 | 388 | 392 |

400 | 377 | 378 | 392 | 380 | 380 | 395 | 393 | 387 |

**Solution**: First, use `1-Var`

`Stats`

to find the sample standard deviation,
which is 6.42 g. Obviously this is greater than the target
standard deviation of 5 g, but is it enough greater that you can
say the machine is not operating correctly, or could it have come from
a population with standard deviation no more than 5 g?
Your hypotheses are

H_{0}: σ = 5, the machine is within spec
(some books would say H_{0}: σ ≤ 5)

H_{1}: σ > 5, the machine is not working right

No α was specified, but for an industrial process with no possibility of human injury α = 0.05 seems appropriate.

Next, check the requirements: is the sample
**normally distributed and free of outliers**?
You can do it with a TI calculator or with a statistics package.
(Excel users, please see
Normality Check and Finding Outliers in Excel.)
On my TI-84, I used
MATH200A part 4 to make a quantile plot to check for normality, and
MATH200A part 2 to make a box-whisker plot to check for outliers. The
results are shown at right, and demonstrate that the sample of box
weights is normally distributed with no outliers.

`χ²cdf` keystrokes | |
---|---|

TI-83 | [`2nd` `VARS` makes `DISTR` ] [`7` ] |

TI-84 | [`2nd` `VARS` makes `DISTR` ] [`8` ] |

TI-89 | Stats/List Editor: [`F5` ] [`8` ] |

Now compute the test statistic:

χ² =
(n−1) s² / σ_{o}² =
44×6.42²/5² = 72.54

Next, compute the p-value. Use either the
accompanying Excel workbook or your TI
calculator’s `χ²cdf`

function; see
keystrokes at right for your model. (If you don’t have Excel or
a suitable calculator, you can probably find a table of the χ²
distribution at the back of your textbook.)
Use n−1 = 44 degrees
of freedom.

p-value = χ²cdf(Ans, 10^99, 44) = 0.0043

Conclusion: p-value < α;
**reject H _{0} and accept H_{1}.**
The machine’s output is too variable: at the 0.05 level of
significance the standard deviation is greater than 5 g.

**Example 2**:
You have a random sample of size 20, with a standard deviation of 125. You
have good reason to believe that the underlying population is normal.
Is the population standard deviation different from 100, at the 0.05
significance level?

**Solution**:
n = 20, s = 125, σ_{o} = 100,
α = 0.05. Your hypotheses are

H_{0}: σ = 100

H_{1}: σ ≠ 100

This is a two-tailed test.

Compute the test statistic:

χ² =
(n−1) s² / σ_{o}² =
19×125²/100² = 29.6875

Compute the p-value.
Since this is a two-tailed test, find the one-tailed p and double it.
(If the one-tailed p-value is >0.5, subtract from 1 and then
double.)
Either use the accompanying
Excel workbook or use your TI
calculator’s `χ²cdf`

function with degrees of
freedom n−1 = 19:

p = 2 * χ²cdf(Ans, 10^99, 19) = 0.1118

p-value > α; you fail to reject H_{0} and
cannot reach a conclusion. The population standard deviation may be
different from 100, or it may not.

You may have noticed that the test for σ actually
uses the sample variance s² and the hypothetical population
variance σ_{o}². Therefore, to make a test for variance
you follow exactly the same procedure except that you already have the
variance and you don’t square it to obtain the test statistic.

To estimate the standard deviation σ of a population at confidence level 1−α, the bounds are

In the formula, *df* = n−1.
χ²(*df,rtail*)
is the χ² value that divides the curve with
area *rtail* to the right and 1−*rtail* to
the left. It’s an inverse χ²
function, analogous to inverse t or inverse normal.

**Caution:**
For standard deviation of a population,
**the confidence interval is not symmetric** and
**the point estimate is not in the middle** of the confidence
interval. Therefore the confidence interval can be expressed
**only in endpoint form**, not in s±E form.

**Example 3**: Of several thousand students
who took the same exam, 40 papers were selected randomly and
statistics were computed. The standard deviation of the sample was 17
points. Estimate the standard deviation of the population, with 95%
confidence. (Recall that test scores are normally distributed.)

**Solution**: 1−α = 0.95, so
α = 0.05, α/2 = 0.025, and
1−α/2 = 0.975. df =
n−1 =39. The confidence interval is

√[ 37 × 17² / χ²(39,0.025) ] < σ < √[ 37 × 17² / χ²(39,0.975) ]

How do we find the two required inverse χ²
values? There are several methods, laid out in
Finding χ²(*df,rtail*), below. For now
let’s just use the values:
χ²(30,0.025) = 58.12006 and χ²(39,0.975) = 23.65432.
Continuing with the calculation,

√[39×17²/58.12006] < σ < √[39×17²/23.65432]

13.9 < σ < 21.8 with 95% confidence

**Remark**: The middle of the confidence interval is
(13.9+21.8)/2 ≈ 17.9, but the point estimate was
17. The confidence interval is not symmetric:
it extends 3.1 below and 4.8 above the point estimate.

Inverse χ² is not easy to compute,
but you’re not necessarily reduced to
**looking up tables in a book**. Here are several methods using
various forms of technology:

- The TI-89 has a function to compute inverse χ².
There’s also a TI-83/84 program;
please see MATH200B Program part 5. That program
also does
*all*the calculations for hypothesis tests and confidence intervals. **Excel**has the CHIINV(*df,rtail*) function.*rtail*in the CHIINV() function is the area of the right-hand tail. The accompanying workbook uses this function to compute inverse χ² and the confidence intervals.- A
**normal approximation for χ² when**is found in Spieleg 1999 [full citation in “References”, below], “Confidence Intervals for χ²”, page 245. When*df*≥ 30*df*≥30, “√(2χ²) − √(2*df*−1) is very nearly normally distributed with mean 0 and standard deviation 1”. Therefore,Example: Using the normal approximation,

χ²(39,0.025) ≈ ½ [z(0.025)+√(2×39−1)]²

z(

*rtail*) is easily found on your calculator by invNorm(1−*rtail*). In this case, z(0.025) = invNorm(1−0.025) = 1.9600, soχ²(39,0.025) ≈ [1.9600+√77]² / 2

χ²(39,0.025) ≈ 57.61934

That agrees quite well with the true value of 58.12006 given above. Using the same technique, χ²(30,0.975) ≈ 23.22212 and the 95% confidence interval on σ is 14.0 to 22.0, close to the interval we found above.

- The National Institute of Standards and Technology’s statistics Web site offers Critical Values of the Chi-Square Distribution for 1 through 100 degrees of freedom. This is part of the useful NIST/SEMATECH e-Handbook of Statistical Methods, located here.

Variance is the square of standard deviation, so the confidence interval procedure is the same except that you don’t take square roots:

**Example 4**:
Heights of US males aged 18–25 are normally distributed. You
take a random sample of 100 from that population and find a
variance of 7.3 in². (Remember that
the units of variance are the square of the units of the original
measurement.)
Estimate the variance of the height of US males
aged 18–25, with 95% confidence.

**Solution**: For a 95% confidence interval,
1−α = 0.95 and
α/2 = 0.025.
From the accompanying workbook we
find

χ²(df,α/2) = χ²(99,0.025) = 128.4219887

χ²(df,1−α/2) = χ²(99,0.975) = 73.36108022

The endpoints of the interval are therefore

99 * 7.3 / 128.42199 < σ² < 99 * 7.3 / 73.36108

5.6275... < σ² < 9.8512...

We’re 95% confident that the variance in heights of US males aged 18–25 is between 5.6 and 9.9 in².

- Spiegel, Murray R., and Larry J. Stephens. 1999.
- Theory and Problems of Statistics. 3d ed. McGraw-Hill.

**15 May 2016**: Move citation to the new References section.**3 Jan 2016**: Convert the Excel workbook to Excel 2007–2016 format. In the workbook, update the links that were pointing to Oak Road Systems or TC3 so that they now point to BrownMath.com.**30 Dec 2015**: Update screen shot for normality check; cross reference to my Excel workbook that checks for normality.- (intervening changes suppressed)
**19 Aug 2007**: New article and workbook.

Because this article helps you,

please click to donate!Because this article helps you,

please donate at

BrownMath.com/donate.

please click to donate!Because this article helps you,

please donate at

BrownMath.com/donate.

Updates and new info: http://BrownMath.com/stat/