Stats without Tears
Solutions for Chapter 12
Updated 6 May 2015
(What’s New?)
Copyright © 2013–2017 by Stan Brown
Updated 6 May 2015
(What’s New?)
Copyright © 2013–2017 by Stan Brown
(1) 
H_{0}: The 25:25:20:15:8:7 model for ice cream preference is good.
H_{1}: The 25:25:20:15:8:7 model for ice cream preference is bad. 

(2)  α = 0.05 
(3–4)  Use MATH200A part 6.
df=5, χ²=9.68, pvalue = 0.0849
Here are the input and output data screens:
(If you have MATH200A V6, you’ll see the pvalue, degrees of freedom, and χ² test statistic on the same screen as the graph.) Common mistake: When a model is given in percentages, some students like to convert the observed numbers to percentages. Never do this! The observed numbers are always actual counts and their total is always the actual sample size. Remark: You could give the model as decimals, .25, .20, .15 and so on. But for the model, all that matters is the relative size of each category to the others, so it’s simpler to use wholenumber ratios. Common mistake: If you do convert the percentages to decimals, remember that 8% and 7% are 0.08 and 0.07, not 0.8 and 0.7. 
(RC) 

(5)  p > α. Fail to reject H_{0}. 
(6)  At the 0.05 level of significance,
you can’t say whether the model is good or bad.
Or, It’s impossible to determine from this sample whether the model is good or bad (p = 0.0849). Remark: For Case 6 only, you could write your nonconclusion as something like “the model is not inconsistent with the data” or “the data don’t disprove the model.” Remark: The χ² test keeps you from jumping to false conclusions. Eyeballing the observed and expected numbers (L2 and L3), you might think they’re fairly far off and the model must be wrong. Yet the test gives a largish pvalue. Remark: If it had gone the other way — if p was less than α — you would say something like “At the .05 level of significance, the model is inconsistent with the data” or “the data disprove the model” or simply “the model is wrong”. 
Solution: Use Case 7, 2way table, in Inferential Statistics: Basic Cases.
(1)  H_{0}: Gun opinion is independent of party
H_{1}: Gun opinion depends on party 

(2)  α = .05 
(3–4)  Put the two rows and three columns in matrix A. (Don’t enter the totals.) Select χ²Test from the menu. Outputs are χ² = 26.13, df = 2, p=2.118098E6 → p = 0.000 002 or <.0001. 
(RC) 
Alternative: use MATH200A part 7 for steps 3–4 and RC. 
(5)  p < α; reject H_{0} and accept H_{1}. 
(6)  At the .05 level of significance,
gun opinion depends on party.
Or, Gun opinion depends on party (p<0.0001). Remark: “Depends on” does not mean that’s the only factor. But if you don’t like “depends on”, you could say “is not independent of”. Or you could say, “party affiliation is a factor in a person’s opinion on gun control.” 
(1) 
H_{0}: Preferences among all first graders are equal.
H_{1}: First graders prefer the five occupations unequally. 

(2)  α = 0.05 
(3/4)  MATH200A part 6 with {1,1,1,1,1} or similar in L1 and the observed data in L2. χ²=12.9412 → χ² = 12.94, df = 4, p=.011567 → p = 0.0116. 
(RC) 

(5)  p < α. Reject H_{0} and accept H_{1}. 
(6)  At the 0.05 significance level, first graders in general have
unequal preferences among the five occupations.
Or, First graders in general have unequal preferences among the five occupations (p = 0.0116). 
(1) 
H_{0}: Egg consumption and age at menarche are independent.
H_{1}: Egg consumption and age at menarche are not independent. 

(2)  α = 0.01 
(3/4)  3×3 in A. Use MATH200A part 7 or
χ²Test
results: χ² = 3.13, df = 4, p.535967 → p = 0.5360 
(RC) 

(5)  p > α. Fail to reject H_{0}. 
(6) 
At the 0.01 level of significance, we can’t determine whether
egg consumption and age at menarche are independent or not.
Or, We can’t determine whether egg consumption and age at menarche are independent or not (p = 0.5360). Remark: The large pvalue makes it really tempting to declare that the two variables are independent. But that would be accepting H_{0}, which we must never do. It’s always possible that there is a connection and we were just unlucky enough that this particular sample didn’t show it. Some researchers would say “There is insufficient evidence to reject the hypothesis of independence.” Strictly speaking, that’s the same error. However, when the audience is researchers, rather than the nontechnical public, it may be understood that they’re not really accepting H_{0}, only failing to reject it pending the outcome of a further study. 
(1) 
H_{0}: Age distribution of grand jurors matches age distribution of county.
H_{1}: Age distribution of grand jurors does not match age distribution of county. 

(2)  α = 0.05 
(3/4)  The county percentages are the model and go in L1. The
numbers of jurors (not percentages) go in L2. Reminder: don’t
include the total row.
results: χ²=61.2656 → χ² = 61.27, df = 3, pvalue = 3.2×10^{13} or p < 0.0001 
(RC)  Because you’re not generalizing, the randomsample rule and the under10% rule don’t matter. You need only check that all expected counts are ≥ 5, and since the lowest is 10.56, the requirements are met. 
(5)  p < α. Reject H_{0} and accept H_{1}. 
(6) 
At the 0.05 significance level, the age distribution of grand jurors is
different from the age distribution in the county.
Or, The age distribution of grand jurors is different from the age distribution in the county (p < 0.0001). Remark: There are a lot of reasons for this. Judges tend to be older and tend to prefer jurors closer to their own age. Also, older candidates are more likely to be retired, which means they are less likely to be exempt by reason of their occupation. 
(1) 
H_{0}: Population size of chosen residence town is independent of
population size of town raised in.
H_{1}: Population size of chosen residence town depends on population size of town raised in. 

(2)  α = 0.05 
(3/4)  Enter the 3×3 array in Matrix A. (Never enter the
totals in a 2way table hypothesis test.) Use MATH200A part 7 or the
calculator’s χ²Test menu selection.
results: df = 4, χ² = 35.74, pvalue=3.271956E7 → pvalue = 0.000 000 3 or pvalue < 0.0001 
(RC) 

(5)  p < α. Reject H_{0} and accept H_{1}. 
(6) 
At the 0.05 significance level, there is an association between the
size of town men choose to live in and the size of town they
grew up in.
Or, There is an association between the size of town men choose to live in and the size of town they grew up in (p < 0.0001). 
(1)  H_{0}: The tested treatments with Echinacea make no
difference to the proportion who catch cold.
H_{1}: The treatments do make a difference. … 

(2)  α = 0.01 
(3/4) 
There were seven treatments and two outcomes, so enter your 7×2
matrix and run a χ²Test or MATH200A part 7.
Results: χ² = 4.74, df = 6, pvalue = 0.5769 Common mistake: Never enter the totals in a twoway test. 
(RC) 

(5)  p > α. Fail to reject H_{0}. 
(6)  At the 0.01 significance level, we can’t determine
whether Echinacea is effective against the common cold or not.
Or, We can’t determine whether Echinacea is effective against the common cold or not (p = 0.5769). Remark: Researchers might write something like “Echinacea made no significant difference to infection rates in our study” with the pvalue or significance level. It’s understood that this does not prove Echinacea ineffective — this particular study fails to reach a conclusion. But as additional studies continue to find p > α, our confidence in the null hypothesis increases. 
Remark: If you used MATH200A part 7, there’s some interesting information in matrix C. The top left 7 rows and 2 columns are the χ² contributions for each of the seven treatments and two outcomes. All are all quite low, in light of the rule of thumb that only numbers above 4 or so are significant, even at the less stringent 0.05 level.
The last two rows are the total numbers and percentages of people who did and didn’t catch cold: 349 (87.5%) and 50 (12.5%). If Echinacea is ineffective, you’d expect to see about that same infection rate for each of the seven treatments. Sure enough, compute the rates from the rows of the data table, and you’ll find that they vary between 81% and 92%.
The third column is the total subjects in each of the seven treatments, and the overall total. Of course you were given those in the data table, but it’s always a good idea to use this information to check your data entry.
The fourth column is the percentage of subjects who were assigned to each of the seven treatments, totaling 100% of course.
Updates and new info: http://BrownMath.com/swt/