BrownMath.com → Statistics → CI for Goodness of Fit
Updated 10 Aug 2018 (Whatís New?)

Confidence Intervals for Goodness of Fit

Copyright © 2011Ė2020 by Stan Brown

Summary: Youíve done a multinomial experiment, and you have counts for each of the k categories or possible responses. You want to compute a confidence interval. Is that possible?

Yes, it is. The multinomial confidence interval is a more general form of the binomial. In the binomial you have two possible outcomes, success and failure; in the multinomial you have three or more possible outcomes. You could compute k independent confidence intervals for the k possible outcomes using an adjusted confidence level. But that treats the outcomes as independent, and theyíre not: an M&M that is red canít also be yellow. A second approach ó a better approach, I think ó is to treat the model as a whole. Then you compute which models could be consistent with your sample data.

This page explains both methods, with examples.

See also: Two Excel workbooks are provided to help with calculations: the M&Ms example (35 KB), and the general case for up to 11 categories (36 KB).

Contents:

CI for M&Ms: Separate Proportions

Plain M&Ms
ColorModelObserved
Blue24%12720%
Brown13%6310%
Green16%12219%
Orange20%14723%
Red13%9315%
Yellow14%7612%
  Total100%628100%

My Spring 2011 evening class counted the colors of plain M&Ms and compared the color distribution to the model on the companyís Web site. Their data are at the right. They computed χ² = 19.44 and p-value = 0.0016, so they rejected the model. But what models would be possible from their data, with 95% confidence?

Caution: As always, to apply the analysis techniques you need a simple random sample, and the class didnít have that. The Fun Size M&Ms packs that they analyzed were bought from the same store on the same day and almost certainly came from one small part of one production run. So they didnít actually prove that the companyís model is wrong, but for the sake of illustrating the methods Iím going to proceed as if they had.

At first, you might think a 95% CI for the model is a piece of cake: just do a 95% CI for the proportion of blues, then a 95% CI for the proportion of browns, and so on. But the problem with that is that youíre doing multiple confidence intervals from the same sample.

Why is that bad? Well, remember what a 95% confidence interval means. Ninety-five percent of the time, the CI does contain the true value of the population parameter, but 5% of the time it does not. If you do six confidence intervals from the same sample, can you say weíre 95% confident in all of them? No. If thereís a 95% chance that any one of them is right, then clearly the chance that all six are right is less than 95%. Theyíre not independent, and so the simple multiplication rule doesnít apply exactly, but the approximate probability of all six being right is 0.956 = 0.74.

If you want to take six confidence intervals from the same data, and have 95% confidence that theyíre all correct, then the confidence level for each of the six CIs must be the sixth root of 95%, about 99.1%. (Another common calculation is to use 1−α/k as confidence level for each of the k categories; this gives a similar number.)

Confidence Intervals for Six Proportions
(99.1% individual, 95% aggregate)
ColorModelObserved CI for p
Blue24%12720%16 to 24%
Brown13%6310% 7 to 13%
Green16%12219%15 to 24%
Orange20%14723%19 to 28%
Red13%9315%11 to 19%
Yellow14%7612% 9 to 16%

You can see the calculations in the second section of the accompanying Excel workbook, where you can even change the confidence level if you wish. If you have a TI-83, 84, 89, or 92 calculator, you can do a 1-PropZInt with the confidence level .95^(1/6).

CI for M&Ms: Overall Model

The previous solution is easy enough to compute, but itís a little unsatisfying. Aside from the issue with choosing a substitute confidence level, thereís a philosophical problem. After all, the hypothesis test wasnít about one color at a time but about the model considered as a whole. Shouldnít the confidence interval also consider the model as a whole?

Well, yes, thatís certainly possible. Itís hard to think about, though. The (unknown) possible proportions for the six colors represent a region in six-dimensional space. And even if you can picture 6-D space, how would you express the results? So as a practical matter you still have to come up with lower and upper bounds for each of the six colors.

How can you do that? Well, remember that a confidence interval is just the flip side of a hypothesis test. If the HT fails to reject H0, then the parameter from H0 is within the confidence interval. (See Confidence Interval and Hypothesis Test if you need a refresher in that concept.) So looking for a CI for goodness of fit is the same as looking for the possible values of the six model percentages that would not be rejected in a HT with the data observed in the sample.

Unfortunately, thatís not easy to compute. In fact, you have to run twelve optimizations using Excel Solver or a similar tool: one each for the minimum and maximum of each of the six colorsí proportions. In these optimizations, you ask ďwhatís the lowest proportion of blue such that, by adjusting the other five proportions, itís still possible to come up with a model that is consistent with the observed sample at the 0.05 significance level?Ē Then you ask that same question about the highest possible proportion of blue, and ask those two questions about the other colors.

In theory this should be doable with Excel Solver, but in practice I found that Excel 2010 Solver violated the constraints on several of those optimizations and gave negative percentages, for example when finding the lowest possible proportion of brown. So I used Evolver from Palisade Corporation to do the optimizations.

The third section of the accompanying Excel workbook is set up to do an optimization with either Solver or Evolver. Here are the results of the twelve optimizations with Evolver; Solver got similar results on the ones where it didnít crash:

95% Confidence Intervals for the Model
ColorModelObserved CI for p
Blue24%12720%15 to 26%
Brown13%6310% 7 to 15%
Green16%12219%15 to 25%
Orange20%14723%18 to 29%
Red13%9315%11 to 20%
Yellow14%7612% 8 to 17%

Whatís the difference between the previous method and this one? Well, first letís get them next to each other so that you can compare them more easily. Iím also showing one more decimal place:

95% Confidence Intervals Compared
ColorModelObserved CI for p
(first method)
CI for p
(second method)
Blue24%12720%16.0 to 24.4%15.4 to 26.0%
Brown13%6310% 6.9 to 13.2% 6.7 to 14.7%
Green16%12219%15.3 to 23.6%14.7 to 25.2%
Orange20%14723%19.0 to 27.9%18.3 to 29.5%
Red13%9315%11.1 to 18.5%10.7 to 20.1%
Yellow14%7612% 8.7 to 15.5% 8.4 to 17.1%

Now you can see that the first set of intervals, computed as six separate proportions, are narrower and are symmetric around , the sample proportions. The second set, computed using the flip side of a hypothesis test, are a bit wider, and the sample proportions are not in the centers of the intervals. This is typical with intervals where χ² is involved; see Inferences about One Population Standard Deviation for other examples.

But which one is right? Well, you pays your money and you takes your choice. Computationally theyíre both right; philosophically it depends on how you want to think about a confidence interval.

CI for Any Goodness of Fit

After the M&Ms example, itís not hard to generalize to any number of categories. In the accompanying workbook, I chose to stop at 11 categories (10 degrees of freedom), because goodness-of-fit tests with more categories are rare. If you have more categories than 11, you can unprotect the worksheet and insert rows where needed.

But most likely you have fewer than 11 categories. In that case, proceed as follows:

  1. In the first section. clear the unneeded categories at the bottom. For example, if you have eight categories, highlight cells A14:C16 and press your Delete key. Donít try to delete rows 14 through 16: the worksheet wonít let you, and itís set up to ignore empty rows.
  2. Enter your category names, model percentages, and observed counts in columns A through C.
  3. You can now read off the χ² statistic and the p-value for a hypothesis test.
  4. In the second section, specify your desired confidence level in cell B23. You can then read off the confidence intervals computed by the first method (individual binomial proportions).
  5. In the third section, set your desired confidence level in B42. The workbook will automatically transfer your model and observed counts from the first section to the third section.
  6. For each category, run one Solver or Evolver optimization to find a minimum and one to find a maximum. Thatís one to seek a minimum for B46, one to seek a maximum for B46, one for minimum and one for maximum of B47, and so on. Before the first one, set the adjustable cells to just the part of column B that actually has numbers. For example, if you have eight categories then the adjustable cells would be B46:B53 and you would run sixteen optimizations.

    Evolver users: If your p-value is <α, Evolver will tell you before the first optimization that a constraint was not met, and will ask to run Constraint Solver. Answer Yes. Constraint Solver will run briefly, and pop up an Optimization Complete window. Click OK, then start the Evolver optimization again. This will happen only before the first optimization, if it happens at all.

    Solver users: The worksheet is protected, to keep you from accidentally deleting or changing a formula. Solver canít cope with a protected sheet, so right-click on the Sheet1 tab at the bottom of the window, and select Unprotect Sheet.

Acknowledgment

This article grew out of Ray Koopmanís and Rich Ulrichís responses in the newsgroup sci.stat.edu to my query, ďConfidence interval after Chi-squared test?Ē in December 2010. The thread is archived here. One issue raised by Ray Koopman: the binomial confidence intervals in this article are done by the Wald method, which is what the TI-83/84 does. However, there is a lot to be said for using a Wilson or adjusted Wald calculation instead, which would result in slightly wider intervals.

Whatís New

Because this article helps you,
please click to donate!
Because this article helps you,
please donate at
BrownMath.com/donate.

Updates and new info: https://BrownMath.com/stat/

Site Map | Home Page | Contact