BrownMath.com → Stats w/o Tears → Ch 12 Solutions
Stats w/o Tears home page

# Stats without TearsSolutions for Chapter 12

Updated 6 May 2015
Copyright © 2013–2024 by Stan Brown, BrownMath.com

View or
Print:
These pages change automatically for your screen or printer. Underlined text, printed URLs, and the table of contents become live links on screen; and you can use your browser’s commands to change the size of the text or search for key words. If you print, I suggest black-and-white, two-sided printing.
Because this textbook helps you,
please click to donate!
Because this textbook helps you,
please donate at
BrownMath.com/donate.
1 There is no difference. What matters in a model is the relative sizes of the predictions for the categories. 40% is 1.6 times 25%, just as 40 is 1.6 times 25.
2 This is attribute data, one population, more than two possible responses: Case 6, goodness-of-fit, in Inferential Statistics: Basic Cases. There are 6 categories, therefore 5 degrees of freedom.
(1) H0: The 25:25:20:15:8:7 model for ice cream preference is good. H1: The 25:25:20:15:8:7 model for ice cream preference is bad. α = 0.05 Use MATH200A part 6. df=5, χ²=9.68, p-value = 0.0849 Here are the input and output data screens:      (If you have MATH200A V6, you’ll see the p-value, degrees of freedom, and χ² test statistic on the same screen as the graph.) Common mistake: When a model is given in percentages, some students like to convert the observed numbers to percentages. Never do this! The observed numbers are always actual counts and their total is always the actual sample size. Remark: You could give the model as decimals, .25, .20, .15 and so on. But for the model, all that matters is the relative size of each category to the others, so it’s simpler to use whole-number ratios. Common mistake: If you do convert the percentages to decimals, remember that 8% and 7% are 0.08 and 0.07, not 0.8 and 0.7. L3 shows the expected counts, and the lowest is 70, so all are ≥5. The problem says that the 1000 people were a random sample. There are millions of ice cream lovers, so the sample of 1000 is less than 10% of population. p > α. Fail to reject H0. At the 0.05 level of significance, you can’t say whether the model is good or bad. Or, It’s impossible to determine from this sample whether the model is good or bad (p = 0.0849). Remark: For Case 6 only, you could write your non-conclusion as something like “the model is not inconsistent with the data” or “the data don’t disprove the model.” Remark: The χ² test keeps you from jumping to false conclusions. Eyeballing the observed and expected numbers (L2 and L3), you might think they’re fairly far off and the model must be wrong. Yet the test gives a largish p-value. Remark: If it had gone the other way — if p was less than α — you would say something like “At the .05 level of significance, the model is inconsistent with the data” or “the data disprove the model” or simply “the model is wrong”.
3

Solution: Use Case 7, 2-way table, in Inferential Statistics: Basic Cases.

(1) H0: Gun opinion is independent of party H1: Gun opinion depends on party α = .05 Put the two rows and three columns in matrix A. (Don’t enter the totals.) Select χ²-Test from the menu. Outputs are χ² = 26.13, df = 2, p=2.118098E-6 → p = 0.000 002 or <.0001. The problem states that the sample was random. With millions of party members, the samples are under 10% of the population. Check the B matrix and find that the lowest expected count is 106.45. Therefore, all expected counts are above the minimum of 5. Alternative: use MATH200A part 7 for steps 3–4 and RC. p < α; reject H0 and accept H1. At the .05 level of significance, gun opinion depends on party. Or, Gun opinion depends on party (p<0.0001). Remark: “Depends on” does not mean that’s the only factor. But if you don’t like “depends on”, you could say “is not independent of”. Or you could say, “party affiliation is a factor in a person’s opinion on gun control.”
4 This is goodness of fit to a model, Case 6 in Inferential Statistics: Basic Cases. Your H0 model is 1:1:1:1:1, or any model with five equal numbers.
(1) H0: Preferences among all first graders are equal. H1: First graders prefer the five occupations unequally. α = 0.05 MATH200A part 6 with {1,1,1,1,1} or similar in L1 and the observed data in L2. χ²=12.9412 → χ² = 12.94, df = 4, p=.011567 → p = 0.0116. Random sample: given. There are many, many first graders, far more than 10×425 = 4250. All L3’s (expected counts) are 85, so all are ≥5. p < α. Reject H0 and accept H1. At the 0.05 significance level, first graders in general have unequal preferences among the five occupations. Or, First graders in general have unequal preferences among the five occupations (p = 0.0116).
5 This is Case 7 in Inferential Statistics: Basic Cases.
(1) H0: Egg consumption and age at menarche are independent. H1: Egg consumption and age at menarche are not independent. α = 0.01 3×3 in A. Use MATH200A part 7 or `χ²-Test` results: χ² = 3.13, df = 4, p-.535967 → p = 0.5360 Random sample: given. At a glance, it looks like the sample size is around 100. But it’s obviously less than 10% of the number of women. Expected values (Matrix B) show one value 4.8148, which is below 5. You can say that it’s just barely below 5, and it’s the only one, so the requirement is effectively met. That’s true, but it’s also a moot point because of the high p-value. p > α. Fail to reject H0. At the 0.01 level of significance, we can’t determine whether egg consumption and age at menarche are independent or not. Or, We can’t determine whether egg consumption and age at menarche are independent or not (p = 0.5360). Remark: The large p-value makes it really tempting to declare that the two variables are independent. But that would be accepting H0, which we must never do. It’s always possible that there is a connection and we were just unlucky enough that this particular sample didn’t show it. Some researchers would say “There is insufficient evidence to reject the hypothesis of independence.” Strictly speaking, that’s the same error. However, when the audience is researchers, rather than the non-technical public, it may be understood that they’re not really accepting H0, only failing to reject it pending the outcome of a further study.
6 This is a goodness-of-fit problem, Case 6 in Inferential Statistics: Basic Cases.
(1) H0: Age distribution of grand jurors matches age distribution of county. H1: Age distribution of grand jurors does not match age distribution of county. α = 0.05 The county percentages are the model and go in L1. The numbers of jurors (not percentages) go in L2. Reminder: don’t include the total row. results: χ²=61.2656 → χ² = 61.27, df = 3, p-value = 3.2×10-13 or p < 0.0001 Because you’re not generalizing, the random-sample rule and the under-10% rule don’t matter. You need only check that all expected counts are ≥ 5, and since the lowest is 10.56, the requirements are met. p < α. Reject H0 and accept H1. At the 0.05 significance level, the age distribution of grand jurors is different from the age distribution in the county. Or, The age distribution of grand jurors is different from the age distribution in the county (p < 0.0001). Remark: There are a lot of reasons for this. Judges tend to be older and tend to prefer jurors closer to their own age. Also, older candidates are more likely to be retired, which means they are less likely to be exempt by reason of their occupation.
7 This is a 2-way table, specifically a test of independence. Use Case 7 in Inferential Statistics: Basic Cases.
(1) H0: Population size of chosen residence town is independent of population size of town raised in. H1: Population size of chosen residence town depends on population size of town raised in. α = 0.05 Enter the 3×3 array in Matrix A. (Never enter the totals in a 2-way table hypothesis test.) Use MATH200A part 7 or the calculator’s `χ²-Test` menu selection. results: df = 4, χ² = 35.74, p-value=3.271956E-7 → p-value = 0.000 000 3 or p-value < 0.0001 Simple random sample: given. 500 men is obviously far below 10% of the total number. All expected counts (Matrix B) are 14.364 or greater, ≥5. p < α. Reject H0 and accept H1. At the 0.05 significance level, there is an association between the size of town men choose to live in and the size of town they grew up in. Or, There is an association between the size of town men choose to live in and the size of town they grew up in (p < 0.0001).
8 This is a 2-way table, specifically a test of homogeneity. You have seven populations, representing by the seven treatments in the experiment, seven ways to pre-treat and treat a cold. If Echinacea is effective, the proportions of infection from the various treatments should be significantly different. Use Case 7 in Inferential Statistics: Basic Cases.
(1) H0: The tested treatments with Echinacea make no difference to the proportion who catch cold. H1: The treatments do make a difference. … α = 0.01 There were seven treatments and two outcomes, so enter your 7×2 matrix and run a `χ²-Test` or MATH200A part 7. Results: χ² = 4.74, df = 6, p-value = 0.5769 Common mistake: Never enter the totals in a two-way test. Random sample? Yes, randomized experimental design. ✔ Sample less than 10% of population? Yes, the population of people exposed to the common cold is indefinitely large. ✔ All expected values ≥5? Yes, matrix B shows all values at least 5.6. ✔ p > α. Fail to reject H0. At the 0.01 significance level, we can’t determine whether Echinacea is effective against the common cold or not. Or, We can’t determine whether Echinacea is effective against the common cold or not (p = 0.5769). Remark: Researchers might write something like “Echinacea made no significant difference to infection rates in our study” with the p-value or significance level. It’s understood that this does not prove Echinacea ineffective — this particular study fails to reach a conclusion. But as additional studies continue to find p > α, our confidence in the null hypothesis increases.

Remark: If you used MATH200A part 7, there’s some interesting information in matrix C. The top left 7 rows and 2 columns are the χ² contributions for each of the seven treatments and two outcomes. All are all quite low, in light of the rule of thumb that only numbers above 4 or so are significant, even at the less stringent 0.05 level.

The last two rows are the total numbers and percentages of people who did and didn’t catch cold: 349 (87.5%) and 50 (12.5%). If Echinacea is ineffective, you’d expect to see about that same infection rate for each of the seven treatments. Sure enough, compute the rates from the rows of the data table, and you’ll find that they vary between 81% and 92%.

The third column is the total subjects in each of the seven treatments, and the overall total. Of course you were given those in the data table, but it’s always a good idea to use this information to check your data entry.

The fourth column is the percentage of subjects who were assigned to each of the seven treatments, totaling 100% of course.

## What’s New?

• 6 May 2015: Correct a typo, thanks to Jessica Smith.
• (intervening changes suppressed)
• 6 Apr 2013: New document.
Because this textbook helps you,
please click to donate!
Because this textbook helps you,
please donate at
BrownMath.com/donate.

Updates and new info: https://BrownMath.com/swt/