Stats without Tears
Solutions for Chapter 10
Updated 1 Jan 2016
(What’s New?)
Copyright © 2013–2023 by Stan Brown, BrownMath.com
Updated 1 Jan 2016
(What’s New?)
Copyright © 2013–2023 by Stan Brown, BrownMath.com
1. Hypotheses. 2. Significance level RC. Requirements check 3–4. Test statistic and p-value 5. Decision rule (or, conclusion in statistics language) 6. Conclusion (in English)
It keeps you honest. If you could select a significance level after computing the value, you could always get the result you want, regardless of evidence.
Answers will vary here. But you should get in the key idea that If H0 is true, the p-value is the chance of getting the sample you got, or a sample even further from H0, purely by random chance. For more correct statements, and common incorrect statements, see What Does the p-Value Mean?
(a) It’s too wishy-washy. When p<α, you can reach a conclusion. Correction: The accelerant makes a difference, at the 0.05 significance level.
(b)You can never prove the null hypothesis of “no difference”. You can’t even say “The accelerant may make no difference,” because that’s only part of the truth: it equally well may make a difference. You must say something like, “At the 0.05 significance level it’s impossible to say whether the accelerant makes a difference or not.”
(a) A Type I error is rejecting the null hypothesis when it’s actually true. In this case, a Type I error would be concluding “the accelerant makes paint dry faster” when actually it makes no difference. This would lead you to launch the product and expose yourself to a lot of warranty claims.
(b) A Type II error is failing to reject the null hypothesis when it’s actually false. In this case, a Type II error would be concluding “the accelerant doesn’t makes paint dry faster” when actually it does. This would lead you to keep the product off the market even though it could add to your sales and would perform as promised.
They are not necessarily mistakes. Type I and II errors are an unavoidable part of sample variability. Nothing can prevent them entirely. The only way to make them both less likely at the same time is to use a larger sample size.
That said, if you make mistakes in data collection or analysis you definitely make Type I or Type II errors (or both of them) more likely.
Make your significance level α smaller. The side effect is making a Type II error more likely.
Your own words will vary from mine, but the main difference is that when p > α you can’t reach a conclusion. Accepting H0 is wrong because it reaches the conclusion that H0 is true. Failing to reject H0 is correct because it leaves both possibilities open.
It’s like a jury verdict of “not guilty beyond a reasonable doubt. The jury is not saying the defendant didn’t do it. They are saying that either he didn’t do it or he did it but the prosecution didn’t present enough evidence to convince them.
A hypothesis test can end up rejecting H0 or failing to reject it, but the result can never be to accept H0.
H0: μ = 500
H1: μ ≠ 500
Remark: It must be ≠, not > or <, because the claim is that the mean is 500 minutes, and a difference in either direction would destroy the claim.
(a) p > α; fail to reject H0. At the 0.01 significance level, we can’t determine whether the directors are stealing from the company or not.
(b) p < α; reject H0 and accept H1. At the 0.01 level of significance, we find that the directors are stealing from the company.
α is the probability of a Type I error that you can tolerate. A Type I error in this case is determining that the defendant is guilty (calling H0 false) when actually he’s innocent (H0 is really true), and the consequence would be putting an innocent man to death. You specify a low α to make it less likely this will happen. Of the given choices, 0.001 is best.
This is binomial data, a Case 2 test of proportion in Inferential Statistics: Basic Cases.
(1) |
H0: p = .1, 10% of TC3 students driving alcohol impaired
H1: p > .1, more than 10% of TC3 students driving alcohol impaired |
---|---|
(2) | α = 0.05 |
(RC) |
|
(3/4) |
1-PropZTest: .1, 18, 120, >po results: z=1.825741858 → z = 1.83, p=.0339445194 → p = 0.0339, p̂ = .15 |
(5) | p < α. Reject H0 and accept H1. |
(6) |
At the 0.05 significance level,
more than 10% of TC3 students were alcohol
impaired on the most recent Friday or Saturday night when they drove,
Or, More than 10% of TC3 students were alcohol impaired on the most recent Friday or Saturday night when they drove (p = 0.0339). |
This is binomial data (against or not against): a Case 2 test of population proportion in Inferential Statistics: Basic Cases.
Requirements check: Random sample? NO, this is a self-selected sample, consisting only of those who returned the poll. (That could be overcome by following up on those who did not return the poll, but nobody did that.)
The 10n≤N requirement also fails. 10n = 10×380 = 3800, much larger than the 1366 population size.
Answer: No, you cannot do any inferential procedure because the requirements are not met.
(b) The population is all persons who do the primary grocery shopping in their households. We don’t know the precise number, but it is surely in the millions since there are millions of households. We can say that it is indefinitely large.
(c) The number 182 is x, the number of successes in the sample.
(d) She wanted to know whether the true proportion is greater than 40%, so her alternative hypothesis is H1: p > 0.4 and po is 0.4.
(e) No. The researcher is interested in the habits of the primary grocery shoppers in households; therefore she must sample only people who are primary grocery shoppers in their households. If you even thought about saying Yes, please go back to Chapter 1 and review what bias actually means.
(a) This is inference about the proportion in one population, Case 2 in Inferential Statistics: Basic Cases.
(1) |
H0: p = 2/3, the chance of winning is 2/3 if you switch doors.
H1: p ≠ 2/3, the chance of winning is different from 2/3 if you switch doors. Remark: You need to test for ≠, not <. You’re asked whether the claim of 2/3 is correct, and if it’s wrong it could be wrong in either direction. It doesn’t matter that the sample data happen to show a smaller proportion than 2/3. |
---|---|
(2) | α = 0.05 |
(RC) |
|
(3/4) | 1-PropZTest, 2/3, 18, 30, ≠
results: z = −.77, p-value = 0.4386, p̂ = 0.6 |
(5) | p > α. Fail to reject H0. |
(6) | We can’t determine whether the claim “switching
doors gives a 2/3 chance of winning” is true or false
(p = 0.4386).
Or, At the 0.05 significance level, we can’t determine whether the probability of winning after switching doors is equal to 2/3 or different from 2/3. Remark: It’s true that you can’t disprove the claim, but it’s also true that you can’t prove it. This is where a confidence interval gives useful information. |
(b) Requirements have already been checked.
1-PropZInt 18, 30, .95. Results: (.4247, .7753), p̂ =
.6.
We’re 95% confident that the true probability of winning if
you switch doors is between 42.5% and 77.5%.
(c) It’s possible that the true probability of winning if you switch doors is 1/3 (33.3%) or even worse, but it’s very unlikely. Why? You’re 95% confident that it’s at least 42.5%. Therefore you’re better than 95% confident that the true probability if you switch is better than the 1/3 probability if you don’t switch doors. Switching is extremely likely to be the good strategy.
(a) A Type I error is rejecting the null hypothesis when it’s actually true. Here, a Type I error means deciding a piece of mail is spam when it’s actually not, so if Heather’s spam filter makes a Type I error then it will delete a piece of real mail. A Type II error is failing to reject H0 when it’s actually false, treating a piece of spam as real mail, so a Type II error would let a piece of spam mail into Heather’s in-box..
(b) Most people would rather see a piece of spam (Type II) than miss a piece of real mail (Type I), so a Type I error is more serious in this situation. Lower significance levels make Type I errors less likely (and Type II errors more likely), so a lower α is appropriate here.
(1) |
H0: p = .304
H1: p < .304, less than 30.4% of Ithaca households own cats. |
---|---|
(2) | α = 0.05 |
(RC) |
|
(3/4) | 1-PropZTest .304, 54, 215, <
results: z = −1.68, p-value = 0.0461, p̂ = 0.2512 |
(5) | p < α. Reject H0 and accept H1. |
(6) | At the 0.05 significance level, fewer than 30.4% of Ithaca
households own cats.
Or, Fewer than 30.4% of Ithaca households own cats (p = 0.0461). |
(a)The population parameter is missing.
It should be either μ or
p, but since a proportion can’t be greater than 1 it must
be μ.
Correction:
H0: μ = 14.2; H1: μ > 14.2
(b) H0 must have an = sign. Correction: H0: μ = 25; H1: μ > 25
(c) You used sample data in your hypotheses. Correction: H0:μ=750; H1:μ>750
(d) You were supposed to test “makes a difference”, not “is faster than”.
Never do a one-tailed test (> or <) unless the other direction is impossible or of no interest at all.
It’s possible that your “accelerant” could actually
increase drying time, and if it does you’d definitely want to
know.
Correction:
H0: μ = 4.3 hr; H1: μ ≠ 4.3 hr
This is numeric data, and you don’t know the standard deviation (SD) of the population. In Inferential Statistics: Basic Cases this is Case 1, a test of population mean.
(1) |
H0: μ = 3.8, the mean pollution this year is no
different from last year
H1: μ < 3.8, the mean pollution this year is lower than last year |
---|---|
(2) | α = 0.01 |
(RC) |
|
(3/4) | T-Test: 3.8, L1, 1, <μo
results: t = −4.749218419 → t = −4.75, p = 5.2266779E−4 → p = 0.0005, x̅ = 3.21, s = .3928528138 → s = 0.39, n = 10 Common mistake: Don’t write “p = 5.2267” or anything equally silly. A p-value is a probability, and probabilities are never greater than 1. |
(5) | p < α. Reject H0 and accept H1. |
(6) | At the 0.01 level of significance, the mean pollution is
lower this year than last year.
Or, The mean pollution this year is lower than last year (p = 0.0005). |
This is numeric data with unknown SD of the population, Case 1 (test of population mean) in Inferential Statistics: Basic Cases.
(1) |
H0: μ = 32.0, quarts are being properly filled
H1: μ < 32.0, Dairylea is shorting the public Remark: Your H1 uses <, not ≠, because the problem asks if Dairylea has a legal problem. Yes, they might be overfilling, but that would not be a legal problem. |
---|---|
(2) | α = 0.05. This is just a business situation, not a matter of life and death. (You could justify a lower α if you can show serious consequences from making a mistake, such as a multimillion libel suit brought by the company against the investigator.) |
(RC) |
|
(3/4) |
T-Test: 32, 31.8, .6, 10, <μo
results: t=−1.054092553 → t = −1.05, p=.159657788 → p = 0.1597 |
(5) | p > α. Fail to reject H0. |
(6) |
At the 0.05 level of significance, we can’t determine whether
Dairylea is giving short volume or not.
Or, We can’t determine from this sample whether Dairylea is giving short volume or not (p = 0.1597). Remark: You never accept the null hypothesis. But in many cases you may proceed as though it’s true. Here, since you can’t prove a case against the dairy, you don’t file charges, make a press release, organize a boycott, etc. You behave exactly as you would behave if you had proof the dairy was honest. But you don’t conclude that Dairylea is giving full measure, either. All your hypothesis test tells you is that it could go either way. |
This is numeric data with unknown SD of population. You’re testing a population mean, Case 1 in Inferential Statistics: Basic Cases.
(1) |
H0: μ = 870, no difference in strength
H1: μ ≠ 870, new glue’s average strength is different Remark: You’re testing different here, not better. It’s possible that the new glue bonds more poorly, and that would be interesting information, either guiding further research or perhaps leading to a new product (think Post-It Notes). |
---|---|
(2) | α = 0.05 |
(RC) |
|
(3/4) |
T-Test: 870, 892.2, 56.0, 30, μ≠μo
results: t=2.17132871 → t = 2.17, p=.038229895 → p = 0.0382 |
(5) | p < α. Reject H0 and accept H1. |
(6) |
At the 0.05 level of significance, new glue has a different mean
strength from the company’s best seller. In fact, it is
stronger.
Or, New glue has a different mean strength from the company’s best seller (p = 0.0382). In fact, it is stronger. Remark: When you are testing ≠, and p<α, you give the two-tailed interpretation “different from”, and then continue with a one-tailed interpretation. See p < α in Two-Tailed Test: What Does It Tell You? |
This is binomial data (each person either has a bachelor’s or doesn’t) for a one-population test of proportion: Case 2 in Inferential Statistics: Basic Cases
(a) Requirements:
1-PropZInt: x=52, n=120, C-Level=.95
Results: (.34467, .52199); p̂=.4333333333 →
p̂ = .4333
We’re 95% confident that 34.5 to 52.2% of Tompkins County residents aged 25+ have at least a bachelor’s degree.
(b) Requirements have already been checked. A two-tailed test at the 0.05 level is equivalent to a confidence interval at the 95% level. The statewide proportion of 32.8% is outside the 95% CI for Tompkins County, and therefore at the 0.05 significance level, the proportion of bachelor’s degrees among Tompkins County residents aged 25+ is different from the statewide proportion of 32.8%. In fact, Tompkins County’s proportion is higher.
This is numeric data, with population SD unknown: test of population mean, Case 1 in Inferential Statistics: Basic Cases.
(1) |
H0: μ = 625, no difference in strength
H1: μ > 625, Whizzo stronger than Stretchie Remark: Here you test for >, not ≠. Even though Whizzo might be less strong, you don’t care unless it’s stronger. |
---|---|
(2) | α = 0.01 |
(RC) |
|
(3/4) | T-Test: 625, L1, 1, >μo
results: t=3.232782217 → t = 3.23, p=.0071980854 → p = 0.0072, x̅ = 675, s=43.74602023 → s = 43.7, n = 8 |
(5) | p < α. Reject H0 and accept H1. |
(6) |
At the 0.01 level of significance, Whizzo is stronger on
average than Stretchie.
Or, Whizzo is stronger on average than Stretchie (p = 0.0072). |
This is numeric data, with σ unknown: test of population mean, Case 1 in Inferential Statistics: Basic Cases.
(1) |
H0: μ = 6
H1: μ > 6 |
---|---|
(2) | α = 0.05 |
(RC) |
|
(3/4) |
T-Test: 6, 6.75, 3.3, 100, >μo
results: t=2.272727273 → t = 2.27, p=.0126021499 → p = 0.0126 |
(5) | p < α. Reject H0 and accept H1. |
(6) |
TC3 students do average more than six hours a week in
volunteer work, at the 0.05 level of significance.
Or, TC3 students do average more than six hours a week in volunteer work (p = 0.0126). |
Binomial data (head or tail) implies Case 2, test of population proportion on Inferential Statistics: Basic Cases. A fair coin has heads 50% likely, or p = 0.5.
(1) |
H0: p = 0.5, the coin is fair
H1: p ≠ 0.5, the coin is biased Common mistake: You must test ≠, not >. An unfair coin would produce more or less than 50% heads, not necessarily more than 50%. Yes, this time he got more than 50% heads, but your hypotheses are never based on your sample data. |
---|---|
(2) | α = 0.05 |
(RC) |
|
(3/4) | 1-PropZTest, .5, 5067, 10000, prop≠po
results: z = 1.34, p = .1802454677 → p-value = 0.1802, p̂ = .5067 |
(5) | p > α. Fail to reject H0. |
(6) | At the 0.05 level of significance, we can’t tell
whether the coin is fair or biased.
Or, We can’t determine from this experiment whether the coin is fair or biased (p = 0.1802). Common mistake: You can’t say that the coin is fair, because that would be accepting H0. You can’t say “there is insufficient evidence to show that the coin is biased”, because there is also insufficient evidence to show that it’s fair. Remark: “Fail to reject H0” situations are often emotionally unsatisfying. You want to reach some sort of conclusion, but when p>α you can’t. What you can do is compute a confidence interval: 1-PropZInt: 5067, 10000, .95 results: (.4969,.5165) You’re 95% confident that the true proportion of heads for this coin (in the infinity of all possible flips) is 49.69% to 51.65%. So if the coin is biased at all, it’s not biased by much. |
You have numeric data, and you don’t know the SD of the population, so this is a Case 1 test of population mean in Inferential Statistics: Basic Cases.
(a) Check requirements: random sample, n = 45 > 30, and there are more than 10×45 = 450 people with headaches.
TInterval: x̅=18, s=8, n=45, C-Level=.95
Results: (15.597, 20.403)
We’re 95% confident that the average time to relief for all headache sufferers using PainX is 15.6 to 20.4 minutes.
(b) Requirements have already been checked. A two-tailed test (a test for “different”) at the 0.05 level is equivalent to a confidence interval at the 1−0.05 = .95 = 95% confidence level. Since the 95% CI includes 20, the mean time for aspirin, we cannot determine, at the 0.05 significance level, whether PainX offers headache relief to the average person in a different time than aspirin or not.
Updates and new info: https://BrownMath.com/swt/