Stats without Tears
Solutions to All Exercises
Updated 17 Nov 2020
Copyright © 2012–2023 by Stan Brown, BrownMath.com
Updated 17 Nov 2020
Copyright © 2012–2023 by Stan Brown, BrownMath.com
Nothing can eliminate sampling error, but you can reduce it by increasing your sample size. (Most nonsampling errors can be avoided by proper experimental design and technique.)
Although “numeric” or “quantitative” is correct, it’s not an adequate answer because it is not as specific as possible. Discrete and continuous data are treated differently in descriptive statistics, so it matters which type you have.
Students are sometimes fooled by the decimal. Always ask yourself what was the original question asked or the original measurement taken from each member of the sample.
Common mistake: Students sometimes answer “80” for population size, but this is not correct. You took data from 80 people, so those 80 people are your sample and 80 is your sample size.
What can be done to reduce response bias? Interviewers should be trained to be absolutely neutral in voice and facial expression, which is how the Kinsey team gathered data on sexual behavior. Or the question can be asked on a written questionnaire, so that the subject isn’t looking another person in the face when answering. The question can also be made less threatening: “Have you ever left an infant alone in the house, even for just a minute?”
randInt(1,2000)
50
times, not counting duplicates, and interview the students who came up
in those positions.To select the first person to survey, use
randInt(1,20)
. Remember that a systematic survey begins
with a randomly selected person from 1 to k, not 1 to 50
(sample size) or 1 to 2000 (population size).
Notice that I didn’t suggest a time frame. What do you think would be a good time to do this?
An alternative procedure might be to walk through the dorms (assuming you can get in) and interview the students in every 20th room. You may get better coverage that way than if you wait for them to come to you.
Best balance? Probably the cluster sample. The true random sample is a lot of work for a sample of 50, because after selecting the names you have to track the students down. The systematic sample, no matter how you do it, is going to miss a lot of students, and you have that time-period problem. With the cluster sample, you can time it for when students are likely to be home, and you can go back to follow up on those you missed.
But nothing is perfect, in this life where we are born to trouble as the sparks fly upward. The cluster sample works if the students were randomly assigned to rooms. When students pick their own roommates, they tend to pick people with similar attitudes, interests, and activities. That means those two are more similar to each other than other students, and there’s no way you can treat that cluster sample as a random sample. The cluster would probably be safe for freshman, where the great majority would be randomly assigned, but less so for students in later years.
Students often answer questions like this with hand-waving arguments, either coming up with reasons why it’s a plausible conclusion or coming up with reasons why it isn’t. This is statistics, and we have to follow the facts. Whatever you may think about Fox News, the fact is that observational studies can’t prove causation.
Alternative: the more specific answer binomial data, which you may have heard in the lecture though it’s not in the book till Chapter 6.
(b) This is descriptive statistics because it’s reporting data actually measured: 42% of the sample. If it said “42% of Americans”, then it would be inferential because you know not every American was asked, so the investigators must have extrapolated from a sample to the population.
(c) It is a statistic because it is a number that summarizes data from a sample.
All of these are nonsampling errors.
To fix it, round to one decimal place: 1.9. (Don’t make the common mistake of “rounding” to 1.8.)
There’s no scale to interpret the quantities. And if one fruit in each row is supposed to represent a given quantity, then banana and apple have the same frequency, yet banana looks like its frequency is much greater.
90% of 15 is 13.5, 80% is 12, 70% is 10.5, and 60% is 9.
Score | Grade | Tallies | Frequency |
---|---|---|---|
13.5–15 | A | || | 2 |
12–13.4 | B | | | 1 |
10.5–11.9 | C | |||| | 5 |
9–10.4 | D | ||| | 3 |
0–8.9 | F | |||| | 4 |
Alternatives: Instead of a title below the category axis, you
could have a title above the graph. You could order the grades from
worst to best (F through A) instead of alphabetically as I did here.
And you could list the class boundaries as 13.5–15,
12–13.5, 10.5–12, and so on, with the understanding
that a score of 12 goes into the 12–13.5 class, not the
10.5–12 class. (Data points “on the cusp” always go
into the higher class.)
(a) The variable is discrete, “number of deaths in a
corps in a given year”.
(b)
Alternatives: Some authors would draw a histogram (bars
touching) or even a pie chart. Those are okay but not the best choice.
Commuting Distance 0 | 5 9 8 1 1 | 5 2 2 1 9 6 2 8 7 6 5 7 2 | 3 2 6 1 6 4 0 3 | 1 4 | 5 Key: 2 | 3 = 23 km |
Relative frequency is f/n. f = 25, and n = 35+10+25+45+20 = 135. Dividing 25/135 gives 0.185185… ≈ 0.19 or 19%
skewed right
(a) See the histogram at right. Important features:
(b) 480.0−470.0 = 10.0 or just plain “10”.
Don’t make the common mistake of subtracting 479.9−470.0. Subtract consecutive lower bounds, always.
(c) skewed left
Alternative solution: In a normal distribution, the mean is half way between the given extremes: μ = (4.50+8.50)/2 = 6.50. Then the distance from the mean to 8.50 must be three SD: 8.50−6.50 = 2.00 = 3σ; σ = 0.67 ounces.
Ages | Midpoint (L1) | Frequency (L2) |
---|---|---|
20 – 29 | 25 | 34 |
30 – 39 | 35 | 58 |
40 – 49 | 45 | 76 |
50 – 59 | 55 | 187 |
60 – 69 | 65 | 254 |
70 – 79 | 75 | 241 |
80 – 89 | 85 | 147 |
Caution! The midpoints are not midway between lower and upper bounds, such as (20+29)/2 = 24.5. They are midway between successive lower bounds, such as (20+30)/2 = 25.
1-VarStats L1,L2 (Check n first!)
x̅ = 63.85656971 → x̅ = 63.86
s = 15.43533244 → s = 15.44
n = 997
Common mistake: People tend to run 1-VarStats L1, leaving off the L2, which just gives statistics of the seven numbers 25, 35, …, 85. Always check n first. If you check n and see that n = 7, you realize that can’t possibly be right since the frequencies obviously add up to more than 7. You fix your mistake and all is well.
(b) You need the original data to make a boxplot, and here you have only the grouped data. A boxplot of a grouped distribution doesn’t show the shape of the data set accurately, because only class midpoints are taken into account. The class midpoints are good enough for approximating the mean and SD of the data, but not the five-number summary that is pictured in the boxplot.
Course | Credits (L2) | Grade | Quality Points (L1) |
---|---|---|---|
Statistics | 3 | A | 4.0 |
Calculus | 4 | B+ | 3.3 |
Microsoft Word | 1 | C− | 1.7 |
Microbiology | 3 | B− | 2.7 |
English Comp | 3 | C | 2.0 |
1-VarStats L1,L2
n = 14 (This is the number of credits attempted. If you get 5, you forgot to include L2 in the command.)
x̅ = 2.93
(a) | Commute Distance, km 0- 9 4 10-19 12 20-29 7 30-39 1 40-49 1 Total 25 |
(b) The class width is 10 (not 9). The class
midpoints are 5, 15, 25, 35, 45 (not 4.5, 14.5, etc.).
(c) Class midpoints in one list such as L2
and frequencies in another list such as L3. This is a sample, so
symbols are x̅, s, n, not μ, σ, N.
|
(d) Data in a list such as L1. 1-VarStats L1 gives x̅ = 17.6 km, Median = 17, s = 9.0 km, n = 25
(e)
(f) Mean, because the data are
nearly symmetric.
Or, median, because there is an
outlier.
Comment: The stemplot made the data look skewed, but
that was just an artifact of the choice of classes. The boxplot shows
that the data are nearly symmetric, except for that outlier. This is
why the mean and median are close together.
This is a good illustration that sometimes there
is no uniquely correct answer. It’s why your justification or
explanation is an important part of your answer.
(g) The five-number summary, from MATH200A part 2 [TRACE
], is
1, 12, 17 22.5, 45. There is
one outlier, 45.
(The five-number summary includes the actual min and max, whether
they are outliers or not.)
Draw an auxiliary line at z = −2. You know that the area between z = −2 and z = +2 is 95%, so the area between z = 0 and z = 2 is half that, 47.5% or 0.475.
zJ = (2070−1500)/300 = 570/300 = 1.90
zM = (129−100)/15 = 29/15 = about 1.93
Because she has the higher z-score, according to the tests Maria is more intelligent.
Remark: The difference is very slight. Quite possibly, on another day Jacinto might do slightly better and Maria slightly worse, reversing their ranking.
Test Scores | Frequencies, f (L2) |
Class Midpoints, x (L1) |
---|---|---|
470.0–479.9 | 15 | 475.0 |
480.0–489.9 | 22 | 485.0 |
490.0–499.9 | 29 | 495.0 |
500.0–509.9 | 50 | 505.0 |
510.0–519.9 | 38 | 515.0 |
Put class midpoints in a list, such as L1, and frequencies go in another list, such as L2. (Either label the columns with the lists you use, as I did here, or state them explicitly: “class marks in L1, frequencies in L2”.)
1-VarStats L1,L2
(Always write down the command that you used.)
(a) n = 154
(b) x̅ = 499.81 (before rounding, 499.8051948)
(c) s = 12.74 (before rounding, 12.74284519)
Be careful with symbols. Use the correct one for symbol or population, whichever you have.
Common mistake: The SD is 12.74 (Sx), not 12.70 (σ), because this is a sample and not the population.
Common mistake: Don’t use any form of the word “correlation” in your answer. Your friend wouldn’t understand it, but it’s wrong anyway. Correlation is the interpretation of r, not R². Yes, r is related to R², but R² as such is not about correlation.
Common mistake: R² tells you how much of the variation in y is associated with variation in x, not the other way around. It’s not accurate to say 64% of variation in age is associated with variation in salary.
Common mistake: Don’t say “explained by” to non-technical people. The regression shows an association, but it does not show that growing older causes salary increases.
(b)
Yes
(c)
The results of
LinReg(ax+b) L1,L2,Y1
are shown at
right. The correlation coefficient is
r = 0.91
(d)
ŷ = 0.1127x − 35.1786
Note: ŷ, not y. Note: −35.1786, not
+−35.1786.
(e)
The slope is 0.1127. An increase of 1000 power-boat registrations is associated with an increase of about 0.11 manatee deaths, on average.
It’s every 1000 boats, not every boat, because the original
table is in thousands. Always be specific: “increase”, not
just “change”.
Remark: Although this is mathematically accurate, people may not respond well to 0.11 as a number of deaths, which obviously is a discrete variable. You might multiply by 100 and say that 100,000 extra registrations are associated with 11 more manatee deaths on average; or multiply by 10 and round a bit to say that 10,000 extra registrations are associated with about one more manatee death on average.
(f) The y intercept is −35.1786. Mathematically, if there were no power boats there would be about minus 35 manatees killed by power boats. But this is not applicable because x=0 (no boats) is far outside the range of x in the data set.
(g)
R² = 0.83. About 83% of variation in manatee deaths from power boats is associated with the variation in registrations of power boats.
It’s R², not r². And don’t use any form of
the word “correlate” in your answer.
100% of manatee power-boat deaths come from power boats, so why isn’t the association 100%? The other 17% is lurking variables plus natural variability. For instance, maybe the weather was different in some years, so owners were more or less likely to use their boats. Maybe a campaign of awareness in some years caused some owners to lower their speeds in known manatee areas.
(h)
ŷ = 27.8
(i) y−ŷ = 34−27.8 = 6.2
(j) Remember that x is in thousands, so a million boats is x = 1000. But x=1000 is far outside the data range, so the regression can’t be used to make a prediction.
(b)
The results of
LinReg(ax+b) L3,L4,Y2
are shown at right.
ŷ = −3.5175x+6.4561
(c)
The slope is −3.5175. Increasing the dial setting by one unit decreases temperature by about 3.5°.
Again, state whether y increases or
decreases with increasing x.
(d) The y intercept is 6.4561. A dial setting of 0 corresponds to about 6.5°.
(e) r = −0.99
(f) R² = 0.98. About 98% of variation in temperature is associated with variation in dial setting.
This seems almost too good to be true, as though the data were just made up. ☺ But it’s hard to think of many lurking variables. Maybe it happened that some measurements were taken just after the compressor shut off, and others were taken just before the compressor was ready to switch on again in response to a temperature rise.
(g)
ŷ = 2.9°
r, the linear correlation coefficient, would be roughly zero. Taking the plot as a whole, as x increases, y is about equally likely to increase or decrease. A straight line would be a terrible model for the data.
Clearly there is a strong correlation, but it is not a linear correlation. Probably a good model for this data set would be a quadratic regression, ŷ = ax²+bx+c. Though we study only linear regressions, your calculator can perform quadratic and many other types.
Remark: Don’t say “caused by” variation in family income. Correlation is not causation. You can think of some reasons why it might be plausible that wealthier families are more likely to produce smarter children, or at least children who do better on standardized tests, but you can’t be sure without a controlled experiment.
Remark: Though it’s an interesting fact, the correlation in twins’ IQ scores is not needed for this problem. In real life, an important part of solving problems and making decisions is focusing on just the relevant information and not getting distracted.
(b)
S = { | HHH HTH THH TTH | } |
---|---|---|
HHT HTT THT TTT |
(c) Three events out of eight equally likely events: P(2H) = 3/8
Common mistake: Sometimes students write the sample space correctly but miss one of the combinations of 2 heads. I wish I could offer some “magic bullet” for counting correctly, but the only advice I have is just to be really careful.
Service type | Prob. |
---|---|
Landline and cell | 58.2% |
Landline only | 37.4% |
Cell only | 2.8% |
No phone | 1.6% |
Total | 100.0% |
(a) In a probability model, the probabilities must add to 1 (= 100%). The given probabilities add to 62.6%. What is the missing 37.4%? They’ve accounted for cell and landline, cell only, and nothing; the remaining possibility is landline only. The model is shown at right.
(b) P(Landline) = P(Landline only) + P(Landline and cell)
P(Landline) = 37.4% + 58.2% = 95.6%
Remark: “Landline” and “cell” are not disjoint events, because a given household could have both. But “landline only” and “landline and cell” are disjoint, because a given house can’t both have a cell phone with landline and have no cell phone with landline.
(b) That A and B are complementary means that one or the other must happen, but not both. Therefore P(B) = P(not A) → P(B) = 0.3
(c) Since the events are complementary, they can’t both happen: P(A and B) = 0
Common mistake: Many students get (c) wrong, giving an answer of 1. If events are complementary, they can’t both happen at the same time. That means P(A and B) must be 0, the probability of something impossible.
Maybe those students were thinking of P(A or B). If A and B are complementary, then one or the other must happen, so P(A or B) = P(A) + P(B) = 1. But part (c) was about probability and, not probability or.
This is the difference between theoretical and empirical probability. A truly impossible event has a theoretical probability of zero. But the 0 out of 412 figure is an empirical probability (based on past experience). Empirical probabilities are just estimates of the “real” theoretical probability. From the empirical 0/412, you can tell that the theoretical probability is very low, but not necessarily zero. In plain language, an unresolved complaint is unlikely, but just because it hasn’t happened yet doesn’t mean it can’t happen.
Common mistake: Students often try some sort of complicated calculation here. You would have to do that if conditions were stated on all five of those cards, but they weren’t. Think about it: any card has a 1/4 chance of being a spade.
(a) 0.0171 × 0.0171 = 0.0003
(b) The events are not independent. When a married couple are at home together or out together, any attack that involves one of them will involve the other also.
(b) About 10.38% of American adults in 2006 were divorced. If you randomly selected an American adult in 2006, there was a 0.1038 probability that he or she was divorced.
(c) Empirical or experimental
(d)
P(divorcedC) = 1−P(divorced) =
1−22.8/219.7 ≈ 0.8962
About 89.62% of American adults in 2006 were not divorced
(or, had a marital status other than divorced).
(e) P(man and married) = 63.6/219.7 ≈ 0.2895 (You can’t use a formula on this one.)
(f) Add up P(man) and P(not man but married):
P(man or married) = 106.2/219.7 + 64.1/219.7 ≈ 0.7751
Alternative solution: By formula:
P(man or married) = P(man) + P(married) − P(man and married)
P(man or married) = 106.2/219.7 + 127.7/219.7 − 63.6/219.7 = 0.7751
Remember, math “or” means one or the other or both.
(g) What proportion of males were never married? 30.3/106.2 = 28.53%.
(h) P(man | married) uses the sub-subgroup of men within the subgroup of married persons.
P(man | married) = 63.6/127.7 = 0.4980
49.80% of married persons were men.
Remark: You might be surprised that it’s under 50%. Isn’t polygamy illegal in the US? Yes, it is. But the table considers only resident adults. Women tend to marry slightly earlier than men, so fewer grooms than brides are under 18. Also, soldiers deployed abroad are more likely to be male.
(i) P(married | man) used the sub-subgroup of married persons within the subgroup of men.
P(married | men) = 63.6/106.2 = 0.5989
59.89% of men were married.
(a)
3 of 20 M&Ms are yellow, so 17 are not yellow. You want
the probability of three non-yellows in a row:
(17/20)×(16/19)×(15/18) ≈
0.5965
(b) The probability is zero, since there are only two reds to start with.
(a) Since the companies are independent, you can use the simple multiplication rule:
P(A bankrupt and W bankrupt) = P(A bankrupt) × P(W bankrupt)
P(A bankrupt and W bankrupt) = .9 × .8 = 0.72
At this point you could compute (b), but it’s little messy because you need the probability that A fails and W is okay, plus the probability that A is okay and W fails. (c) looks easier, so do that first.
(c) “Neither bankrupt” means both are okay. Again, the events are independent so you can use the simple multiplication rule.
P(neither bankrupt) = P(A okay and W okay)
P(A okay) = 1−.9 = 0.1; P(W okay) = 1−.8 = 0.2
P(neither bankrupt) = .1 × .2 = 0.02
(b) is now a piece of cake.
P(only one bankrupt) = 1 − P(both bankrupt) − P(none bankrupt)
P(only one bankrupt) = 1 − .72 − .02 = 0.26
Remark: If you have time, it’s always good to check your work and work out (b) the long way. You have only independent events (whether A is okay or fails, whether W is okay or fails) and disjoint events (A fails and W okay, A okay and W fails). The “okay” probabilities were computed in part (c).
P(only one bankrupt) = (A bankrupt and W okay) or (A okay and W bankrupt)
P(only one bankrupt) = (.9 × .2) + (.1 × .8) = 0.26
Common mistake: When working this out the long way, students often solve only half the problem. But when you have probability of exactly one out of two, you have to consider both A-and-not-W and W-and-not-A.
You can’t use the “or” formula here, even if you studied it. That computes the probability of one or the other or both, but you need the probability of one or the other but not both.
Remark: If you computed all three probabilities the long way, pause a moment to check your work by adding them to make sure you get 1. Whenever possible, check your work with a second type of computation.
(a) 15(You can assume independence because it’s a small sample from a large population.) P(red1 and red2 and red3) = 0.13×0.13×0.13 = 0.0022
(b)
P(red) = 0.13; P(redC) = 1−0.13 = 0.87.
P(red1C and
red2C and
red3C) = 0.87×0.87×0.87 or 0.87³ =
0.6585
Common mistake: Students sometimes compute 1−.13³. But .13³ is the probability that all three are red, so 1−.13³ is the probability that fewer than three (0, 1, or 2) are red. You need the probability that zero are red, not the probability that 0, 1, or 2 are red. Think carefully about where your “not” condition must be applied!
(c)
The complement is your friend with
“at least” problems. The complement of “at least one is
green” is “none of them is green”, which is the same as
“every one is something other than green.”
P(green) = 0.16, P(non-green) = 1−0.16 = 0.84.
P(≥1 green of 3) = 1 − P(0 green of 3) = 1 −
P(3 non-green of 3) = 1−0.84³ ≈
0.4073
(d)
(Sequences are the most practical way to solve this
one.)
(A) G1 and G2C and G3C;
(B) G1C and G2 and G3C;
(C) G1C and G2C and G3
.16×(1−.16)×(1−.16) +
(1−.16)×.16×(1−.16) +
(1−.16)×(1−.16)×.16 ≈
0.3387
P(all 5 attended) = 0.45^5 = 0.0185
P(at least 1 had not attended) = 1 − 0.0185 = 0.9815
(cherry1 and orange2) or (orange1 and cherry2)
Common mistake: There are two ways to get one of each: cherry followed by orange and orange followed by cherry. You have to consider both probabilities.
There are 11+9 = 20 sourballs in all, and Grace is choosing the sourballs without replacement (one would hope!), so the probabilities are:
(11/20)×(9/19) + (9/20)×(11/19) = 99/190 or about 0.5211
P(win ≥1) = 1−P(win 0) = 1−P(lose 5).
P(lose) = 1−P(win) = 1−(1/500) = 499/500
P(lose 5) = [P(lose)]5 = (499/500)^5 = 0.9900
P(win ≥1) = 1−P(lose 5) = 1−0.9900 = 0.0100 or 1.00%
Common mistake: If you compute 1−(499/500)5 in one step and get 0.00996008, be careful with your rounding! 0.00996… rounds to 0.0100 or 1%, not 0.0010 or 0.1%.
Common mistake: 1/500 + 1/500 + … is wrong. You can add probabilities only when events are disjoint, and wins in the various years are not disjoint events. It is possible (however unlikely) to win more than once; otherwise it would make no sense for the problem to talk about winning “at least once”.
Common mistake: You can’t multiply 5 by anything. Take an analogy: the probability of heads in one coin flip is 50%. Does that mean that the probability of heads in four flips is 4×50% = 200%? Obviously not! Any process that leads to a probability >1 must be incorrect.
Common mistake: 1−(1/500)5 is wrong. (1/500)5 is the probability of winning five years in a row, so 1−(1/500)5 is the probability of winning 0 to 4 times. What the problem asks is the probability of winning 1 to 5 times.
(a) P(not first and not second) = P(not first) × P(not second) = (1−.7)×(1−.6) = 0.12
(c) P(first and second) = P(first) × P(second) = .7×.6 = 0.42
(b) 1−.12−.42 = 0.46
Alternative: You could compute (b) directly too, using sequences:
P(exactly one copy recorded) =
P(first and not second) + P(second and not first) =
P(first)×(1−P(second)) + P(second)×(1−P(first)) =
.7×(1−.6) + .6×(1−.7) = 0.46
A very common mistake on problems like this is writing down only one of the sequences. When you have exactly one success (or exactly any definite number), almost always there are multiple ways to get to that outcome.
You can’t use the “or” formula here, even if you studied it. That computes the probability of one or the other or both, but you need the probability of one or the other but not both.
(b) The probability of not getting a ticket on a given morning is 1−0.27 = 0.73. The probability of getting no tickets on five mornings in a row is therefore 0.735 ≈ 0.2073 or about 21%.
P(man) = 106.2/219.7 ≈0.4834
P(man|divorced) = 9.7/22.8 ≈ 0.4254
Since P(man|divorced) ≠ P(man), the events are not independent.
Alternative solution: You could equally well show that P(divorced|man) ≠ P(divorced):
P(divorced|man) = 9.7/106.2 ≈ 0.0913
P(divorced) = 22.8/219.7 ≈ 0.1038
x ($) | P(x) |
---|---|
9,999,995 | 1/10,000,000 |
95 | 1/125 |
5 | 1/20 |
−5 | .9419999 |
(b) $ in L1, probabilities in L2. 1-VarStats L1,L2
yields μ = −2.70.
The expected value of a ticket is −$2.70. This is a
bad deal for you. (It’s a very good deal for the
lottery company. They’ll make $2.70 per ticket, on
average.)
Common mistakes: Students sometimes give hand-waving arguments such as the top prize being very unlikely, or the lottery company always getting to keep the ticket price, but these are not relevant. The only thing that determines whether it’s a good or bad deal for the player is the expected value μ.
μ = 1/p = 1/.066 ≈ 15.2
Over the course of her undead existence, taking each night’s hunt as a separate experience, the average of all nights has her first getting an O negative drink from her fifteenth victim.
(b) geometcdf(.066,10)
=
.4947936946 ≈ 0.4948. Velma has almost a
50% chance of getting O negative blood within her first ten
victims.
(You could also do this as a binomial, n = 10, p = 0.066, x = 1 to 10.)
(c) This is a binomial model with n = 10,
p = 0.066, and x = 2. Use MATH200A part 3 or
binompdf(10,.066,2)
= .1135207874 ≈
0.1135. Velma has just over an 11% chance of getting
exactly two O negative victims within her first ten.
(a) geometpdf(.08, 3) = .067712 → 0.0677
(b) geometcdf(.08, 3) = .221312 → 0.2213
(You could also do this as a binomial with n = 3, p = 0.08, x = 1 to 3.)
(a) This is a binomial distribution: each student passes or not, whether one student passes has nothing to do with whether anyone else passes, and there are a fixed seven trials.
μ = np = 7*0.8 ⇒ μ = 5.6 people
σ = √[npq] = √[7*0.8*(1−0.8)] = 1.058300524 ⇒ σ = 1.1 people
(b) Binomial again, n = 7, p = 0.8,
x = 4 to 6. Use
binompdf
-sum
or MATH200A part 3 to find
P(4 ≤ x ≤ 6) =
0.7569.
(c) Geometric model: p = 0.8, x = 3.
geometpdf(.8,3)
= 0.0320
(d) geometcdf(.8,2)
= 0.9600
Alternative solution: Binomial probability with n = 2, p = 0.8, x = 1 to 2 gives the same answer.
binomcdf(40,.49,13)
or MATH200A Program part 3 with n=40,
p=.49, x=0 to 13 gives .0259693307 →
0.0260, less than 5%, so you would be surprised
though maybe not flabbergasted. ☺
(b) The company’s gross profit is $180.00−130.40 = $49.60, about 28%. But it could very well cost the company that much to sell the policy, pay the agent’s commission, and enter the policy in the computer. Also, all policies must bear part of the company’s general overhead costs. The price is not necessarily unfair in the plain English sense.
(a) x’s in L1, P’s in L2. 1-VarStats L1,L2
yields μ = 2 (exactly) and
σ = 1.095353824 or
σ ≈ 1.1. Interpretation:
In the long run, on average you expect to get two heads per group of
five flips. You expect most groups of five flips will yield between
μ−σ = 1 head and μ+σ = 3
heads.
(b) (I wouldn’t use this part as a regular quiz question.) The long-term average is 2 heads out of 5 flips, which is p = 2/5 = 40%. Obviously coin flips are independent, so the probability of heads must be the same every time. Therefore you have a binomial model with n = 5 and p = 0.4.
(a) Binomial probability with n = 5, p = 0.7,
x = 3 to 5. MATH200A part 3 5, .7, 3, 5 yields .83692 or
P(x ≥ 3) = 0.8369. Or,
binompdf(5,.7)→L6
and then
sum(L6,4,6)
to get the same answer.
Or, use the complement:
1−binomcdf(5,.7,2)
.
(b) You need the mean of the binomial distribution:
μ = np = 10×0.7 = 7
(c) 5 is less than the expected number, so you compute P(x≤5):
MATH200A part 3 10, .7, 0, 5 yields 0.1503, or
binomcdf(10,.7.5)
= 0.1503,
not surprising
Common mistake: Don’t just compute P(x=5), which is 0.1029. When you want to know whether a result is unusual or surprising, you have to find the probability of that result or one even further from the expected value.
binompdf(5,.34,0)
= .1252332576,
about a 12.5% chance
P(64 ≤ x ≤ 67) = normalcdf(64, 67, 64.1, 2.75) = 0.3686871988 → 0.3687
36.87% of women are 64″ to 67″ tall.
x1 = invNorm(.025, 69.3, 2.92) = 63.57690516
x2 = invNorm(1−.025, 69.3, 2.92) = 75.02309484
Heights under 63.6″ or over 75.0″ would be considered unusual.
P15 = invNorm(.15, 69.3, 2.92) = 66.27361453
You must be at least 66″ or 5′6″ tall. Also acceptable: at least 66¼ inches, or at least 66.3 inches.
P25 = invNorm(.25, 64.1, 2.75) = 62.24515319 → P25 = 62.2″
P75 = invNorm(.75, 64.1, 2.75) = 65.95484681 → P75 = 66.0″
(b) Q3 is P75 and Q1 is P25, so the IQR is P75−P25 = 65.95484681−62.24515319 = 3.70969362 → IQR = 3.7″.
(c) 1.35σ = 1.35×2.75 = 3.7125 → 3.7″, matching the IQR as expected. (The match isn’t perfect, because 1.35 is a rounded number.)
P(x ≤ 735) = normalcdf(−10^99, 735, 500, 100) = 0.9906.
A score of 735 is at the 99th percentile.
x1 = invNorm(1−.02, 1500, 300) = 2116.124673
You must score at least 2117. (If you round to 2116, you get a number that is a bit less than the computed minimum. While rounding usually makes sense, there are situations where you have to round up, or round down, instead of following the usual rule.)
Common mistake:
The probability is
not 7.24! That’s not just wrong,
it’s very wrong — probabilities are
never greater than 1. “E−4” on your
calculator comes at the end of the number, but it’s
critical info. It means “times
10 to the minus 4th power”, so the probability is
7×10−4 or 0.0007.
xm1 = invNorm(.05, 69.3, 2.92) = 64.49702741
xm2 = invNorm(1−.05, 69.3, 2.92) = 74.10297259
xf1 = invNorm(.05, 64.1, 2.75) = 59.57665253
xf2 = invNorm(1−.05, 64.1, 2.75) = 68.62334747
Men must be 64.5 to 74.1 inches tall; women must be 59.6 to 68.6 inches tall.
Sample means are ND with mean 800 hours and SD 5 hours. The sketch is at right.
Common mistake: The correct standard deviation is 5 hours, not 50. You’re not sketching the population of light bulbs. Rather, you’re now interested in the distribution of average lifetimes in samples of 100 bulbs. (The axis is the x̅ axis, not the x axis.)
780 hours, the sample mean that the problem asks about, is 20 hours below the population mean of 800. 20/5 = 4 standard errors, so you should have marked 780 hours at four standard deviations below the mean.
A sample mean of 780 is less than the population mean of 800 hours. Therefore you compute the probability of a sample mean of 780 hours or less. It will be surprising (unusual, unexpected) if the probability is under 5%.
P(x̅ ≤ 780) = normalcdf(−10^99, 780, 800, 50/√(100)) = 3.1686E-5 → P(x̅ ≤ 780) = 0.00003
(You can also give the probability as <0.0001.) Yes, this is surprising.
Common mistake: Don’t give the probability as 3.1686. Probabilities are never greater than 1.
(b) If the manufacturer’s claim is true, there are only three chances in a hundred thousand of getting a sample mean this low. It’s very unlikely that the manufacturer’s claim is true.
Answer: normally distributed with mean = 0.72, standard deviation (standard error) = 0.020
Common mistake: Don’t write n≥30 when testing the normal approximation. The n≥30 test applies to numeric data, but in this problem you have binomial data.
(b) 350/500 = 0.70 exactly, and 370/500 = 0.74 exactly. In a sample of 500, finding 350 to 374 successes is the same as finding 70% to 74% successes.
If you stored the computed SEP in part (a), then your screen will look like the one on the left. Otherwise, it will look like the one on the right:
or
Answer: P(70% ≤ p̂ ≤ 74%) = 0.6808.
Remark: Always check for reasonableness. 70% and 74% are one standard error below and above the mean, so you know from the Empirical Rule that about 68% of the data should be within that region.
Remark: The problem wanted you to use the normal approximation, but it’s always good to check answers by a different method if possible. 70%×500 = 350; 74%×500 = 370. MATH200A part 3 with n=500, p=.72, from 350 to 370, gives a probability of 0.7044, pretty good agreement.
P(x̅ ≤ $31,000) = normalcdf(−10^99, 31000, 32400, 19000/√(1000)) = 0.0099, almost exactly 1%. That would be pretty unlikely if the population mean was still $32,400, so the city manager is most likely correct.
Remark: This problem was adapted from Freedman, Pisani, Purves (2007, 415) [see “Sources Used” at end of book].
x | P(x) | |
---|---|---|
Red | +10 | 18/38 |
Black or Green | −10 | 20/38 |
Total | n/a | 38/38 = 1 |
(a) The model is at right. You could list green and black separately, but since they have the same outcome there’s no need to do that. It’s important to have the probabilities as exact fractions, not approximate decimals.
(b) x’s in L1, P’s in L2.
1-VarStats L1,L2
gives μ = −$0.53, σ = $9.99.
Interpretation: In the long run, a player who bets $10 on red will
lose an average of 53¢ per bet.
Remark: Notice that the SD is about 20 times the mean. This is why gambling is so exciting for the player: there’s a lot of variability from one bet to the next.
(c) With n = 10,000, the sampling distribution of x̅ is normally distributed. (10n = 10×10,000 = 100,000, less than the total number of bets while the casino is in business. The bets placed in a given day are not random, but they are representative of all possible bets and therefore effectively random.) The mean of the sampling distribution is the mean of the population: μx̅ = +$0.53. (Whatever players lose, the casino wins, so the mean is the opposite of a player’s mean.) The standard error of the mean is σ/√n = 9.986139979/√10000; σx̅ ≈ $0.10.
Remark: This is why gambling is predictable for the operators: the SD is small compared to the mean.
(d) 10,000×$.5263157895 = $5,263.16
(e)
To lose money, the casino has to make less than $0.00. Zero is more
than five standard errors below the mean (has a z-score below
−5), so you know right off that it would be unusual for the
casino to lose money.
normalcdf
confirms that:
P(lose on 10,000 bets) = 6.8×10-8.
The casino has essentially no risk (7 chances in 100 million) of
losing money on 10,000 bets.
(f)
Remember the elevator example. A total of $2000 on 10,000 bets is an
average of 2000/10,000 = $0.20 per bet. Use
normalcdf
to compute the probability of doing that well
or better:
P(make ≥$2000) = 0.9995. Not only is
the casino virtually certain not to lose money, it’s almost
certain to make a handsome profit, as long as people come in to place
bets.
P(∑x > 75.6) = P(x̅ > 5.04) = normalcdf(75.6/15, 10^99, 5.00, 0.05/√(15)) = 9.7295E-4 ≈ 0.0010, about one chance in a thousand.
Answer: P(x > 43.0) = 0.1634
(b) The sampling distribution of x̅ is
ND, even for this small
sample, because the population is
ND. The standard error is
σx̅ =
5.1/√14 ≈
1.4.
P(x̅ > 43.0) = normalcdf(43, 10^99, 38, 5.1/√(14)) 1.2212E-4 → P(x̅>43.0) = 0.0001 or 0.01%
Remark: This sketch is not very well proportioned, because it makes the probability look much larger than it actually is.
The standard error of the mean is σx̅ = 3.5/√1000, about 0.11. The sampling distribution of the mean is normal because data are numeric and n =1000, greater than 30. (Treat the sample as random because it’s a “typical neighborhood”. And a thousand households is less than 10% of all the households that there are.)
P(x̅ > 12.778) = normalcdf(12.778, 10^99, 12.5, 3.5/√(1000) = 0.0060
Therefore the sampling distribution can be approximated by a normal distribution.
The standard error of the proportion or SEP is σp̂ = √pq/n = √.0171(1−.0171)/11037 ≈ 0.0012
If you use my shortcut, your screen will look like the one at the left; if not, it will look like the one at the right.
or
Either way, the probability is 2.2013×10-10, or 0.000 000 000 2. There are only two chances in ten billion of getting a sample proportion of 0.94% or less with sample size 11,037, if the true population proportion is 1.71%. That’s pretty darn unlikely, so based on this experiment you can rule out coincidence and decide that aspirin does reduce the chance of a heart attack among adult males.
The standard error of the mean or SEM is σx̅ = σ/√n = 2.92/√16 = 0.73″.
μx̅ ± 2σx̅ = 69.3 ± 2×.73 = 67.84 to 70.76.
Sample means between those values would not be surprising, and therefore a sample mean would be surprising if it is under 67.84″ or over 70.76″.
Alternative solution: That back-of-the-envelope calculation is good enough, but you could also get a more precise answer:
L = invNorm(0.025, 69.3, 2.92/√(16)) = 67.87
H = invNorm(1−0.025, 69.3, 2.92/√(16)) = 70.73
What does the sampling distribution look like? The center is
μp̂ = p = 0.45. The standard
error is σp̂ =
√0.45×(1−.45)/1504 ≈ 0.013.
Check requirements to make sure that a normal model can be used for
the sampling distribution:
P(x ≥ 737) = P(p̂ ≥ 49%) = normalcdf(737/1504, 10^99, .45, √(.45*(1−.45)/1504)) ≈ 9E-4 or 0.0009.
Can you draw a conclusion? Yes, you can. In a population with 45% unfavorable rating of the Tea Party, there are only 9 chances in 10,000 of getting a sample as unfavorable as this one (or more unfavorable). That’s pretty unlikely, so you conclude that the true unfavorable rating in October was most likely more than 45% of all Americans. (In Chapter 9, you’ll learn how to estimate that proportion from a sample.)
(Statisticians would say, “the population mean or proportion is not a random variable.” By that, they mean just what I said in less technical language.)
Answer: A confidence interval for numeric data is an estimate of the average, and tells you nothing about individuals. Correct his conclusion to I’m 90% confident that the average food expense for all TC3 students is between $45.20 and $60.14 per week..
Remark: Use all or a similar word to show that you’re estimating the mean for the population, not just the sample of 40 students. There’s no need to estimate the mean of the sample, because you know the exact sample mean x̅ for your sample.
Remark: Be clear in your mind that you’re estimating the average spending per student at $45–60 a week. Some individual students will quite likely spend outside that range, so your interpretation shouldn’t say anything about individual student spending.
Answer: It’s the use of the word average. When you collect data points that are all yes/no or success/failure, you have a sample proportion p̂, equal to the number of successes divided by sample size, and you can estimate a population proportion. There is no “average” with non-numeric data.
Your 90% confidence estimate is simply that 27% to 40% usually or always prepare their own food.
This is a confidence interval about a mean, Case 1 in Inferential Statistics: Basic Cases.
Requirements: random sample, OK. 10n = 10×40 = 400 is less than total number of batteries made; OK. n = 40 >30, OK.
TInterval 1756, 142, 40, .95
(1710.6, 1801.4)
Neveready is 95% confident that the average Neveready A cell, operating a wireless mouse, lasts 1711 to 1801 minutes (28½ to 30 hours).
Common mistake: Don’t make any statement about 95% of the batteries! Your CI is about your estimate of one number, the average life of all batteries. Your CI has a margin of error of ±15 minutes; the 95% range for all batteries would be about 4 to 5 hours.
Don’t make the term “point estimate” harder than it is! The point estimate for the population mean (or proportion, standard deviation, etc.) is just the sample mean (or proportion, standard deviation, etc.).
(b) The sample is his actual data, the 10,000 flips. Therefore the sample size is n = 10,000. The population is what he wants to know about, all possible flips. The population size is infinite or “indefinitely large”.
This is sample size for a confidence interval about a proportion, Case 2 in Inferential Statistics: Basic Cases. Since you have no prior estimate, use 0.5 for p̂.
With the MATH200A program (recommended): | If you’re not using the program: |
---|---|
MATH200a/sample size/binomial, p̂ = .5, E = .035, C-Level = .95, sample size is at least 784 |
![]() ![]() 1−α = .95 ⇒ α/2 = 0.025. z0.025 = invNorm(1−.025, 0, 1) Divide by .035, square the result, and multiply by .5*(1−.5). Answer: at least 784. Remember — you’re not rounding, you’re going up to a whole number. |
This is a confidence interval about a proportion, Case 2 in Inferential Statistics: Basic Cases.
Requirements:
Common mistake: Don’t say “n > 30” or “n ≥ 30”. That’s true, but it doesn’t help you with binomial data. For computing a confidence interval about a proportion from binomial data, the “sample size large enough” condition is at least 10 successes and at least 10 failures, not sample size at least 30.
1-PropZInt 40, 100, .9 → (.31942, .48058), p̂ = .4
31.9% to 48.1% of all claims at that office have been open for more than a year (90% confidence).
This is a confidence interval about a mean, Case 1 in Inferential Statistics: Basic Cases.
Requirements check:
TInterval 17.7, 1.8, 40, .95 → (17.124, 18.276)
She’s 95% confident that the average of all her commutes is 17.1 to 18.3 minutes.
This is a confidence interval about a mean, Case 1 in Inferential Statistics: Basic Cases.
Requirements check:
TInterval L6, 1, .95 → (62.918, 65.016), x̅=63.96666667, s=1.894226818, n=15
The average height of women aged 20–29 is 62.9 to 65.0 inches (95% confidence).
Remark: Since adult women’s heights are known to be normally distributed, you could get away without checking for normality and outliers in this sample. But it does no harm to check every time.
This is a confidence interval about a mean, Case 1 in Inferential Statistics: Basic Cases.
Requirements check:
TInterval L5, 1, .9 → (97.757, 98.343),
x̅ = 98.05,
s =.7155828558 → 0.72,
n = 18.
(a)
Fred is 90% confident that the average body temperature of healthy male students is 97.8 to 98.3 °F.
(b) He’s 90% confident that the average body temperature is not more than 98.3°, so 98.6° as normal (average) temperature is inconsistent with his data.
(c) E = 98.343−98.05 = 0.3°, or E = 98.05−97.757 = 0.3°, or (98.343−97.757)/2 = 0.3°.
With the MATH200A program (recommended): | If you’re not using the program: |
---|---|
(d) MATH200A/Sample size/Num unknown σ: s=.7155828558, E=.1, C-Level=.95, n≥202. He will need at least 202 in his sample. |
(d) Confidence level = 1−α = 0.95 ⇒
α = 0.05 ⇒ α/2 = 0.025.
![]() Multiply by s, divide by E, and square the result. This gives 197. But the t distribution is more spread out than the normal (z) distribution, so you probably want to bump that number up a bit, say to 200 or so. |
This problem is about a confidence interval about a proportion, Case 2 in Inferential Statistics: Basic Cases.
(a) Requirements check:
1-PropZInt, 219, 500, .9 → (.4015, .4745), p̂ = .438
You’re 90% confident that 40.2% to 47.5% of Metropolis adults aged 50–75 have had a colonoscopy in the past ten years.
(b) MATH200A/sample size/binomial, p̂ = .438, E = .02, C-Level = .9 → at least 1665
This is a confidence interval about a mean, Case 1 in Inferential Statistics: Basic Cases.
Requirements check:
TInterval L4, 1, .95 → (179.86, 198.93), x̅ = 189.40, s = 20.37, n = 20
You’re 95% confident that the average of all cash deposits is between $179.86 and $198.93.
Common mistake: Don’t say that 95% of deposits are between those values — if you look at the sample you’ll see that’s pretty unlikely. You’re estimating the average, not the individual deposits in the population.
This is a confidence interval about a proportion, Case 2 in Inferential Statistics: Basic Cases.
Requirements check:
1-PropZInt 520, 1000, .95 → (.48904, .55096), p̂ = .52
With 95% confidence, 48.9% to 55.1% of voters voted Snake. At the 95% confidence level, we can’t tell whether more or less than 50% of voters voted for Abe Snake.
1. Hypotheses. 2. Significance level RC. Requirements check 3–4. Test statistic and p-value 5. Decision rule (or, conclusion in statistics language) 6. Conclusion (in English)
It keeps you honest. If you could select a significance level after computing the value, you could always get the result you want, regardless of evidence.
Answers will vary here. But you should get in the key idea that If H0 is true, the p-value is the chance of getting the sample you got, or a sample even further from H0, purely by random chance. For more correct statements, and common incorrect statements, see What Does the p-Value Mean?
(a) It’s too wishy-washy. When p<α, you can reach a conclusion. Correction: The accelerant makes a difference, at the 0.05 significance level.
(b)You can never prove the null hypothesis of “no difference”. You can’t even say “The accelerant may make no difference,” because that’s only part of the truth: it equally well may make a difference. You must say something like, “At the 0.05 significance level it’s impossible to say whether the accelerant makes a difference or not.”
(a) A Type I error is rejecting the null hypothesis when it’s actually true. In this case, a Type I error would be concluding “the accelerant makes paint dry faster” when actually it makes no difference. This would lead you to launch the product and expose yourself to a lot of warranty claims.
(b) A Type II error is failing to reject the null hypothesis when it’s actually false. In this case, a Type II error would be concluding “the accelerant doesn’t makes paint dry faster” when actually it does. This would lead you to keep the product off the market even though it could add to your sales and would perform as promised.
They are not necessarily mistakes. Type I and II errors are an unavoidable part of sample variability. Nothing can prevent them entirely. The only way to make them both less likely at the same time is to use a larger sample size.
That said, if you make mistakes in data collection or analysis you definitely make Type I or Type II errors (or both of them) more likely.
Make your significance level α smaller. The side effect is making a Type II error more likely.
Your own words will vary from mine, but the main difference is that when p > α you can’t reach a conclusion. Accepting H0 is wrong because it reaches the conclusion that H0 is true. Failing to reject H0 is correct because it leaves both possibilities open.
It’s like a jury verdict of “not guilty beyond a reasonable doubt. The jury is not saying the defendant didn’t do it. They are saying that either he didn’t do it or he did it but the prosecution didn’t present enough evidence to convince them.
A hypothesis test can end up rejecting H0 or failing to reject it, but the result can never be to accept H0.
H0: μ = 500
H1: μ ≠ 500
Remark: It must be ≠, not > or <, because the claim is that the mean is 500 minutes, and a difference in either direction would destroy the claim.
(a) p > α; fail to reject H0. At the 0.01 significance level, we can’t determine whether the directors are stealing from the company or not.
(b) p < α; reject H0 and accept H1. At the 0.01 level of significance, we find that the directors are stealing from the company.
α is the probability of a Type I error that you can tolerate. A Type I error in this case is determining that the defendant is guilty (calling H0 false) when actually he’s innocent (H0 is really true), and the consequence would be putting an innocent man to death. You specify a low α to make it less likely this will happen. Of the given choices, 0.001 is best.
This is binomial data, a Case 2 test of proportion in Inferential Statistics: Basic Cases.
(1) |
H0: p = .1, 10% of TC3 students driving alcohol impaired
H1: p > .1, more than 10% of TC3 students driving alcohol impaired |
---|---|
(2) | α = 0.05 |
(RC) |
|
(3/4) |
1-PropZTest: .1, 18, 120, >po results: z=1.825741858 → z = 1.83, p=.0339445194 → p = 0.0339, p̂ = .15 |
(5) | p < α. Reject H0 and accept H1. |
(6) |
At the 0.05 significance level,
more than 10% of TC3 students were alcohol
impaired on the most recent Friday or Saturday night when they drove,
Or, More than 10% of TC3 students were alcohol impaired on the most recent Friday or Saturday night when they drove (p = 0.0339). |
This is binomial data (against or not against): a Case 2 test of population proportion in Inferential Statistics: Basic Cases.
Requirements check: Random sample? NO, this is a self-selected sample, consisting only of those who returned the poll. (That could be overcome by following up on those who did not return the poll, but nobody did that.)
The 10n≤N requirement also fails. 10n = 10×380 = 3800, much larger than the 1366 population size.
Answer: No, you cannot do any inferential procedure because the requirements are not met.
(b) The population is all persons who do the primary grocery shopping in their households. We don’t know the precise number, but it is surely in the millions since there are millions of households. We can say that it is indefinitely large.
(c) The number 182 is x, the number of successes in the sample.
(d) She wanted to know whether the true proportion is greater than 40%, so her alternative hypothesis is H1: p > 0.4 and po is 0.4.
(e) No. The researcher is interested in the habits of the primary grocery shoppers in households; therefore she must sample only people who are primary grocery shoppers in their households. If you even thought about saying Yes, please go back to Chapter 1 and review what bias actually means.
(a) This is inference about the proportion in one population, Case 2 in Inferential Statistics: Basic Cases.
(1) |
H0: p = 2/3, the chance of winning is 2/3 if you switch doors.
H1: p ≠ 2/3, the chance of winning is different from 2/3 if you switch doors. Remark: You need to test for ≠, not <. You’re asked whether the claim of 2/3 is correct, and if it’s wrong it could be wrong in either direction. It doesn’t matter that the sample data happen to show a smaller proportion than 2/3. |
---|---|
(2) | α = 0.05 |
(RC) |
|
(3/4) | 1-PropZTest, 2/3, 18, 30, ≠
results: z = −.77, p-value = 0.4386, p̂ = 0.6 |
(5) | p > α. Fail to reject H0. |
(6) | We can’t determine whether the claim “switching
doors gives a 2/3 chance of winning” is true or false
(p = 0.4386).
Or, At the 0.05 significance level, we can’t determine whether the probability of winning after switching doors is equal to 2/3 or different from 2/3. Remark: It’s true that you can’t disprove the claim, but it’s also true that you can’t prove it. This is where a confidence interval gives useful information. |
(b) Requirements have already been checked.
1-PropZInt 18, 30, .95. Results: (.4247, .7753), p̂ =
.6.
We’re 95% confident that the true probability of winning if
you switch doors is between 42.5% and 77.5%.
(c) It’s possible that the true probability of winning if you switch doors is 1/3 (33.3%) or even worse, but it’s very unlikely. Why? You’re 95% confident that it’s at least 42.5%. Therefore you’re better than 95% confident that the true probability if you switch is better than the 1/3 probability if you don’t switch doors. Switching is extremely likely to be the good strategy.
(a) A Type I error is rejecting the null hypothesis when it’s actually true. Here, a Type I error means deciding a piece of mail is spam when it’s actually not, so if Heather’s spam filter makes a Type I error then it will delete a piece of real mail. A Type II error is failing to reject H0 when it’s actually false, treating a piece of spam as real mail, so a Type II error would let a piece of spam mail into Heather’s in-box..
(b) Most people would rather see a piece of spam (Type II) than miss a piece of real mail (Type I), so a Type I error is more serious in this situation. Lower significance levels make Type I errors less likely (and Type II errors more likely), so a lower α is appropriate here.
(1) |
H0: p = .304
H1: p < .304, less than 30.4% of Ithaca households own cats. |
---|---|
(2) | α = 0.05 |
(RC) |
|
(3/4) | 1-PropZTest .304, 54, 215, <
results: z = −1.68, p-value = 0.0461, p̂ = 0.2512 |
(5) | p < α. Reject H0 and accept H1. |
(6) | At the 0.05 significance level, fewer than 30.4% of Ithaca
households own cats.
Or, Fewer than 30.4% of Ithaca households own cats (p = 0.0461). |
(a)The population parameter is missing.
It should be either μ or
p, but since a proportion can’t be greater than 1 it must
be μ.
Correction:
H0: μ = 14.2; H1: μ > 14.2
(b) H0 must have an = sign. Correction: H0: μ = 25; H1: μ > 25
(c) You used sample data in your hypotheses. Correction: H0:μ=750; H1:μ>750
(d) You were supposed to test “makes a difference”, not “is faster than”.
Never do a one-tailed test (> or <) unless the other direction is impossible or of no interest at all.
It’s possible that your “accelerant” could actually
increase drying time, and if it does you’d definitely want to
know.
Correction:
H0: μ = 4.3 hr; H1: μ ≠ 4.3 hr
This is numeric data, and you don’t know the standard deviation (SD) of the population. In Inferential Statistics: Basic Cases this is Case 1, a test of population mean.
(1) |
H0: μ = 3.8, the mean pollution this year is no
different from last year
H1: μ < 3.8, the mean pollution this year is lower than last year |
---|---|
(2) | α = 0.01 |
(RC) |
|
(3/4) | T-Test: 3.8, L1, 1, <μo
results: t = −4.749218419 → t = −4.75, p = 5.2266779E−4 → p = 0.0005, x̅ = 3.21, s = .3928528138 → s = 0.39, n = 10 Common mistake: Don’t write “p = 5.2267” or anything equally silly. A p-value is a probability, and probabilities are never greater than 1. |
(5) | p < α. Reject H0 and accept H1. |
(6) | At the 0.01 level of significance, the mean pollution is
lower this year than last year.
Or, The mean pollution this year is lower than last year (p = 0.0005). |
This is numeric data with unknown SD of the population, Case 1 (test of population mean) in Inferential Statistics: Basic Cases.
(1) |
H0: μ = 32.0, quarts are being properly filled
H1: μ < 32.0, Dairylea is shorting the public Remark: Your H1 uses <, not ≠, because the problem asks if Dairylea has a legal problem. Yes, they might be overfilling, but that would not be a legal problem. |
---|---|
(2) | α = 0.05. This is just a business situation, not a matter of life and death. (You could justify a lower α if you can show serious consequences from making a mistake, such as a multimillion libel suit brought by the company against the investigator.) |
(RC) |
|
(3/4) |
T-Test: 32, 31.8, .6, 10, <μo
results: t=−1.054092553 → t = −1.05, p=.159657788 → p = 0.1597 |
(5) | p > α. Fail to reject H0. |
(6) |
At the 0.05 level of significance, we can’t determine whether
Dairylea is giving short volume or not.
Or, We can’t determine from this sample whether Dairylea is giving short volume or not (p = 0.1597). Remark: You never accept the null hypothesis. But in many cases you may proceed as though it’s true. Here, since you can’t prove a case against the dairy, you don’t file charges, make a press release, organize a boycott, etc. You behave exactly as you would behave if you had proof the dairy was honest. But you don’t conclude that Dairylea is giving full measure, either. All your hypothesis test tells you is that it could go either way. |
This is numeric data with unknown SD of population. You’re testing a population mean, Case 1 in Inferential Statistics: Basic Cases.
(1) |
H0: μ = 870, no difference in strength
H1: μ ≠ 870, new glue’s average strength is different Remark: You’re testing different here, not better. It’s possible that the new glue bonds more poorly, and that would be interesting information, either guiding further research or perhaps leading to a new product (think Post-It Notes). |
---|---|
(2) | α = 0.05 |
(RC) |
|
(3/4) |
T-Test: 870, 892.2, 56.0, 30, μ≠μo
results: t=2.17132871 → t = 2.17, p=.038229895 → p = 0.0382 |
(5) | p < α. Reject H0 and accept H1. |
(6) |
At the 0.05 level of significance, new glue has a different mean
strength from the company’s best seller. In fact, it is
stronger.
Or, New glue has a different mean strength from the company’s best seller (p = 0.0382). In fact, it is stronger. Remark: When you are testing ≠, and p<α, you give the two-tailed interpretation “different from”, and then continue with a one-tailed interpretation. See p < α in Two-Tailed Test: What Does It Tell You? |
This is binomial data (each person either has a bachelor’s or doesn’t) for a one-population test of proportion: Case 2 in Inferential Statistics: Basic Cases
(a) Requirements:
1-PropZInt: x=52, n=120, C-Level=.95
Results: (.34467, .52199); p̂=.4333333333 →
p̂ = .4333
We’re 95% confident that 34.5 to 52.2% of Tompkins County residents aged 25+ have at least a bachelor’s degree.
(b) Requirements have already been checked. A two-tailed test at the 0.05 level is equivalent to a confidence interval at the 95% level. The statewide proportion of 32.8% is outside the 95% CI for Tompkins County, and therefore at the 0.05 significance level, the proportion of bachelor’s degrees among Tompkins County residents aged 25+ is different from the statewide proportion of 32.8%. In fact, Tompkins County’s proportion is higher.
This is numeric data, with population SD unknown: test of population mean, Case 1 in Inferential Statistics: Basic Cases.
(1) |
H0: μ = 625, no difference in strength
H1: μ > 625, Whizzo stronger than Stretchie Remark: Here you test for >, not ≠. Even though Whizzo might be less strong, you don’t care unless it’s stronger. |
---|---|
(2) | α = 0.01 |
(RC) |
|
(3/4) | T-Test: 625, L1, 1, >μo
results: t=3.232782217 → t = 3.23, p=.0071980854 → p = 0.0072, x̅ = 675, s=43.74602023 → s = 43.7, n = 8 |
(5) | p < α. Reject H0 and accept H1. |
(6) |
At the 0.01 level of significance, Whizzo is stronger on
average than Stretchie.
Or, Whizzo is stronger on average than Stretchie (p = 0.0072). |
This is numeric data, with σ unknown: test of population mean, Case 1 in Inferential Statistics: Basic Cases.
(1) |
H0: μ = 6
H1: μ > 6 |
---|---|
(2) | α = 0.05 |
(RC) |
|
(3/4) |
T-Test: 6, 6.75, 3.3, 100, >μo
results: t=2.272727273 → t = 2.27, p=.0126021499 → p = 0.0126 |
(5) | p < α. Reject H0 and accept H1. |
(6) |
TC3 students do average more than six hours a week in
volunteer work, at the 0.05 level of significance.
Or, TC3 students do average more than six hours a week in volunteer work (p = 0.0126). |
Binomial data (head or tail) implies Case 2, test of population proportion on Inferential Statistics: Basic Cases. A fair coin has heads 50% likely, or p = 0.5.
(1) |
H0: p = 0.5, the coin is fair
H1: p ≠ 0.5, the coin is biased Common mistake: You must test ≠, not >. An unfair coin would produce more or less than 50% heads, not necessarily more than 50%. Yes, this time he got more than 50% heads, but your hypotheses are never based on your sample data. |
---|---|
(2) | α = 0.05 |
(RC) |
|
(3/4) | 1-PropZTest, .5, 5067, 10000, prop≠po
results: z = 1.34, p = .1802454677 → p-value = 0.1802, p̂ = .5067 |
(5) | p > α. Fail to reject H0. |
(6) | At the 0.05 level of significance, we can’t tell
whether the coin is fair or biased.
Or, We can’t determine from this experiment whether the coin is fair or biased (p = 0.1802). Common mistake: You can’t say that the coin is fair, because that would be accepting H0. You can’t say “there is insufficient evidence to show that the coin is biased”, because there is also insufficient evidence to show that it’s fair. Remark: “Fail to reject H0” situations are often emotionally unsatisfying. You want to reach some sort of conclusion, but when p>α you can’t. What you can do is compute a confidence interval: 1-PropZInt: 5067, 10000, .95 results: (.4969,.5165) You’re 95% confident that the true proportion of heads for this coin (in the infinity of all possible flips) is 49.69% to 51.65%. So if the coin is biased at all, it’s not biased by much. |
You have numeric data, and you don’t know the SD of the population, so this is a Case 1 test of population mean in Inferential Statistics: Basic Cases.
(a) Check requirements: random sample, n = 45 > 30, and there are more than 10×45 = 450 people with headaches.
TInterval: x̅=18, s=8, n=45, C-Level=.95
Results: (15.597, 20.403)
We’re 95% confident that the average time to relief for all headache sufferers using PainX is 15.6 to 20.4 minutes.
(b) Requirements have already been checked. A two-tailed test (a test for “different”) at the 0.05 level is equivalent to a confidence interval at the 1−0.05 = .95 = 95% confidence level. Since the 95% CI includes 20, the mean time for aspirin, we cannot determine, at the 0.05 significance level, whether PainX offers headache relief to the average person in a different time than aspirin or not.
(a) Use MATH200A part 5 and select 2-pop binomial
. You have no
prior estimates, so enter 0.5 for p̂1 and p̂2. E is 0.03,
and C-Level is 0.95. Answer: you need
at least 2135 per sample, 2135 people under 30 and 2135
people aged 30 and older. Here’s what it looks like, using
MATH200A part 5:
Caution! Even if you don’t identify the groups, at least you must say “per sample”. Plain “2135” makes it look like you need only that many people in the two groups combined, or around 1068 per group, and that is very wrong.
Caution! You must compute this as a two-population case. If you compute a sample size for just one group or the other, you get 1068, which is just about half of the correct value.
If you don’t have the program, you have to use the
formula:
[p̂1(1−p̂1)+p̂2(1−p̂2)]·(zα/2/E)².
You don’t have any prior estimates, so p̂1
and p̂2 are both equal
to 0.5. Multiply out
p̂1 × (1−p̂1) × p̂2 ×
(1−p̂2) to get .5.
Next, 1−α = 0.95, so α = 0.05 and α/2 = 0.025. zα/2 = z0.025 = invNorm(1−0.025). Divide that by E (.03), square, and multiply by the result of the computation with the p̂’s.
(b) Using MATH200A Program part 5 with .3, .45, .03, .95 gives 1953 per sample.
Alternative solution: Using the formula, .3(1−.3)+.45(1−.45) = .4575. Multiply by (invNorm(1−.05/2)/.03)² as before to get 1952.74157 → 1953 per sample.
Again, you must do this as two-population binomial. If you do the under-30 group and the 30+ group separately, you get sample sizes of 897 and 1057, which are way too small. If your samples are that size, the margins of error for under-30 and 30+ will each be 3%, but the margin of error for the difference, which is what you care about, will be around 4.2%, and that’s greater than the desired 3%.
(a) You have numeric data in two independent samples. You’re testing the difference between the means of two populations, Case 4 in Inferential Statistics: Basic Cases. (The data aren’t paired because you have no reason to associate any particular Englishman with any particular Scot.)
(1) | Population 1 = English; population 2 = Scots.
H0: μ1 = μ2 (or μ1−μ2 = 0) H1: μ1 > μ2 (or μ1−μ2 > 0) |
---|---|
(2) | α = 0.05 |
(RC) | The problem states that samples were random.
For English, r=.9734 and crit=.9054; for Scots, r=.9772 and
crit=.9054. Both r’s are greater than crit, so both are
nearly normally distributed. The stacked boxplot shows no outliers.
And obviously the samples of 8 are far less than 10% of the
populations of England and Scotland.
|
(3/4) | English numbers in L1, Scottish numbers in L2.
2-SampTTest with Data; L1, L2, 1, 1, μ1>μ2, Pooled:No Outputs: t=1.57049305 → t = 1.58, p=.0689957991 → p = 0.0690, df=13.4634, x̅1=6.54, x̅2=4.85, s1=1.91, s2=2.34, n1=8, n2=8 |
(5) | p > α. Fail to reject H0. |
(6) |
At the 0.05 level of significance,
we can’t say whether English or Scots have a stronger liking for soccer.
Or, We can’t say whether English or Scots have a stronger liking for soccer (p = 0.0690). |
(b) Requirements are already covered.
2-SampTInt, C-Level=.90
Results: (−.2025, 3.5775)
We’re 90% confident that, on a scale from 1=hate to 10=love, the average Englishman likes soccer between 0.2 points less and 3.6 points more than the average Scot.
(a) This is the difference of proportions in two populations, Case 5 in Inferential Statistics: Basic Cases.
(1) | Population 1 = English, population 2 = Scots.
H0: p1 = p2 (or p1−p2 = 0) H1: p1 ≠ p2 (or p1−p2 ≠ 0) |
---|---|
(2) | α = 0.05 |
(RC) |
|
(3/4) | 2-PropZTest x1=105, n1=150, x2=160, n2=200, p1≠p2
results: z=−2.159047761 → z = −2.16, p=.030846351 → p = 0.0308, p̂1 = 0.70, p̂2 = 0.80, p̂ = 0.7571428751 |
(5) | p < α. Reject H0 and accept H1. |
(6) |
The English and Scots are not equally likely to be soccer fans, at the 0.05 level of significance;
in fact the English are less likely to be soccer fans.
Or, The English and Scots are not equally likely to be soccer fans, (p = .0308); in fact the English are less likely to be soccer fans. |
(b) Requirements already checked.
2-PropZInt with C-Level = .95 → (−.1919, −.0081)
That’s the estimate for p1−p2, English minus Scots. Since that’s negative, English like soccer less than Scots do. With 95% confidence, Scots are more likely than English to be soccer fans, by 0.8 to 19.2 percentage points.
(c) [(−.0081) − (−.1919)] / 2 = 0.0919, a little over 9 percentage points.
(d) MATH200A part 5, 2-pop binomial, p̂1=.7, p̂2=.8, E=.04, C-Level .95 gives 889 per sample
By formula, zα/2 =
z0.025 = invNorm(1−0.025) = 1.96.
n1 = n2 =
[.7(1−.7)+.8(1−.8)]×(1/96/.04)² =
888.37 → 889 per
sample
(a) This is before-and-after paired data, Case 3 in Inferential Statistics: Basic Cases. You’re testing the mean difference.
(1) | d = After−Before
H0: μd = 0, running makes no difference in HDL H1: μd > 0, running increases HDL Remark: If this was a research study, they would probably test for a difference in HDL, not just an increase. Maybe this study was done by a fitness center or a running-shoe company. They would want to find an increase, and HDL decreasing or staying the same would be equally uninteresting to them. |
---|---|
(2) | α = 0.05 |
(RC) |
Before in L1, After in L2, L3=L2−L1
|
(3/4) |
T-Test 0, L3, 1, μ>0
results: t=3.059874484 → t = 3.06, p=.0188315555 → p = 0.0188, d̅=4.6, s=3.36, n=5 |
(5) | p < α. Reject H0 and accept H1. |
(6) |
At the 0.05 level of significance, running 4 miles daily for six months raises HDL level.
Or, Running 4 miles daily for six months raises HDL level (p = 0.0188). |
(b) TInterval with C-Level .9 gives (1.3951, 7.8049).
Interpretation: You are 90% confident that running an average of four miles a day for six months will raise HDL by 1.4 to 7.8 points for the average woman.
Caution! Don’t write something like “I’m 90% confident that HDL will be 1.4 to 7.8”. The confidence interval is not about the HLD level, it’s about the change in HDL level.
Remark: Notice the correspondence between hypothesis test and confidence interval. The one-tailed HT at α = 0.05 is equivalent to a two-tailed HT at α = 0.10, and the complement of that is a CI at 1−α = 0.90 or a 90% confidence level. Since the HT did find a statistically significant effect, you know that the CI will not include 0. If the HT had failed to find a significant effect, then the CI would have included 0. See Confidence Interval and Hypothesis Test.
(a) Each participant either had a heart attack or didn’t, and the doctors were all independent in that respect. This is binomial data. You’re testing the difference in proportions between two populations, Case 5 in Inferential Statistics: Basic Cases.
(1) |
Population 1: Aspirin takers; population 2: non-aspirin takers.
H0: p1 = p2, taking aspirin makes no difference H1: p1 ≠ p2, taking aspirin makes a difference |
---|---|
(2) | α = 0.001 |
(RC) |
|
(3/4) |
2-PropZTest: x1=139, n1=11037, x2=239, n2=11034, p1≠p2
results: z=−5.19, p-value = 2×10-7, p̂1 = .0126, p̂2 = .0217, p̂ = .0171
|
(5) | p < α. Reject H0 and accept H1. |
(6) |
At the 0.001 level of significance, aspirin does make a difference to the likelihood of heart attack.
In fact it reduces it.
Or, Aspirin makes a difference to the likelihood of heart attack (p < 0.0001). In fact, aspirin reduces the risk. |
Remark The study was conducted from 1982 to 1988 and was stopped early because the results were so dramatic. For a non-technical summary, see Physicians’ Health Study (2009) [see “Sources Used” at end of book]. More details are in the original article from the New England Journal of Medicine (Steering Committee 1989 [see “Sources Used” at end of book]).
(b) 2-PropZInt
with C-Level .95 gives (−.0125,
−.0056).
We’re 95% confident that 325 mg of aspirin every other day reduces the chance of heart attack by 0.56 to 1.25 percentage points.
Caution! You’re estimating the change in heart-attack risk, not the risk of heart attack. Saying something like “with aspirin, the risk of heart attack is 0.56 to 1.25%” would be very wrong.
(a) You’re estimating the difference in means between two populations. This is Case 4 in Inferential Statistics: Basic Cases. Requirements:
Population 1 = Cortland County houses, population 2 = Broome
County houses.
2-SampTInt, 134296, 44800, 30, 127139, 61200, 32, .95, No
results: (−20004, 34318)
June is 95% confident that the average house in Cortland County costs $20,004 less to $34,318 more than the average house in Broome County.
(b) A 95% confidence interval is the complement of a significance test for ≠ at α = 0.05. Since 0 is in the interval, you know the p-value would be >0.05 and therefore June can’t tell, at the 0.05 significance level, whether there is any difference in average house price in the two counties or not.
If both ends of the interval were positive, that would indicate a difference in averages at the 0.05 level, and you could say Cortland’s average is higher than Broome’s. Similarly, if both ends were negative you could say Cortland’s average is lower than Broome’s. But as it is, nada.
Remark: Obviously Broome County is cheaper in the sample. But the difference is not great enough to be statistically significant. Maybe the true mean in Broome really is less than in Cortland; maybe they’re equal; maybe Broome is more expensive. You simply can’t tell from these samples.
The immediate answer is that
those are proportions in the sample, not the proportions among all voters.
This is two-population binomial data, Case 5 in Inferential Statistics: Basic Cases.
Requirements check:
Population 1 = Red voters, population 2 = Blue
voters.
2-PropZInt 520, 1000, 480, 1000, .95
Results: (−.0038, .08379), p̂1=.48, p̂2=.52
With 95% confidence, the Red candidate is somewhere between 0.4 percentage points behind Blue and 8.4 ahead of Blue. The confidence interval contains 0, and so it’s impossible to say whether either one is leading.
Remark: Newspapers often report the sample proportions p̂1 and p̂2 as though they were population proportions, but now you know that they aren’t. A different poll might have similar results, or it might have samples going the other way and showing Blue ahead of Red.
(b) For a hypothesis test, we often use “at least 10 successes and 10 failures in each sample” as a shortcut requirements test, but the real requirement is at least 10 successes and 10 failures expected in each sample, using the blended proportion p̂. If the shortcut procedure fails, you must check the real requirement. In this problem, the blended proportion is
p̂ = (x1+x2)/(n1+n2) = (7+18)/(28+32) =25/60, about 42%.
For sample 1, with n1 = 28, you would expect 28×25/60 ≈ 11.7 successes and 28−11.7 = 16.3 failures. For sample 2, with n2 = 32, you would expect 32×25/60 ≈ 13.3 successes and 32−13.3 = 18.7 failures. Because all four of these expected numbers are at least 10, it’s valid to compute a p-value using 2-PropZTest.
(1) |
H0: The 25:25:20:15:8:7 model for ice cream preference is good.
H1: The 25:25:20:15:8:7 model for ice cream preference is bad. |
---|---|
(2) | α = 0.05 |
(3–4) | Use MATH200A part 6.
df=5, χ²=9.68, p-value = 0.0849
Here are the input and output data screens:
(If you have MATH200A V6, you’ll see the p-value, degrees of freedom, and χ² test statistic on the same screen as the graph.) Common mistake: When a model is given in percentages, some students like to convert the observed numbers to percentages. Never do this! The observed numbers are always actual counts and their total is always the actual sample size. Remark: You could give the model as decimals, .25, .20, .15 and so on. But for the model, all that matters is the relative size of each category to the others, so it’s simpler to use whole-number ratios. Common mistake: If you do convert the percentages to decimals, remember that 8% and 7% are 0.08 and 0.07, not 0.8 and 0.7. |
(RC) |
![]()
|
(5) | p > α. Fail to reject H0. |
(6) | At the 0.05 level of significance,
you can’t say whether the model is good or bad.
Or, It’s impossible to determine from this sample whether the model is good or bad (p = 0.0849). Remark: For Case 6 only, you could write your non-conclusion as something like “the model is not inconsistent with the data” or “the data don’t disprove the model.” Remark: The χ² test keeps you from jumping to false conclusions. Eyeballing the observed and expected numbers (L2 and L3), you might think they’re fairly far off and the model must be wrong. Yet the test gives a largish p-value. Remark: If it had gone the other way — if p was less than α — you would say something like “At the .05 level of significance, the model is inconsistent with the data” or “the data disprove the model” or simply “the model is wrong”. |
Solution: Use Case 7, 2-way table, in Inferential Statistics: Basic Cases.
(1) | H0: Gun opinion is independent of party
H1: Gun opinion depends on party |
---|---|
(2) | α = .05 |
(3–4) | Put the two rows and three columns in matrix A. (Don’t enter the totals.) Select χ²-Test from the menu. Outputs are χ² = 26.13, df = 2, p=2.118098E-6 → p = 0.000 002 or <.0001. |
(RC) |
Alternative: use MATH200A part 7 for steps 3–4 and RC. |
(5) | p < α; reject H0 and accept H1. |
(6) | At the .05 level of significance,
gun opinion depends on party.
Or, Gun opinion depends on party (p<0.0001). Remark: “Depends on” does not mean that’s the only factor. But if you don’t like “depends on”, you could say “is not independent of”. Or you could say, “party affiliation is a factor in a person’s opinion on gun control.” |
(1) |
H0: Preferences among all first graders are equal.
H1: First graders prefer the five occupations unequally. |
---|---|
(2) | α = 0.05 |
(3/4) | MATH200A part 6 with {1,1,1,1,1} or similar in L1 and the observed data in L2. χ²=12.9412 → χ² = 12.94, df = 4, p=.011567 → p = 0.0116. |
(RC) |
|
(5) | p < α. Reject H0 and accept H1. |
(6) | At the 0.05 significance level, first graders in general have
unequal preferences among the five occupations.
Or, First graders in general have unequal preferences among the five occupations (p = 0.0116). |
(1) |
H0: Egg consumption and age at menarche are independent.
H1: Egg consumption and age at menarche are not independent. |
---|---|
(2) | α = 0.01 |
(3/4) | 3×3 in A. Use MATH200A part 7 or
χ²-Test
results: χ² = 3.13, df = 4, p-.535967 → p = 0.5360 |
(RC) |
|
(5) | p > α. Fail to reject H0. |
(6) |
At the 0.01 level of significance, we can’t determine whether
egg consumption and age at menarche are independent or not.
Or, We can’t determine whether egg consumption and age at menarche are independent or not (p = 0.5360). Remark: The large p-value makes it really tempting to declare that the two variables are independent. But that would be accepting H0, which we must never do. It’s always possible that there is a connection and we were just unlucky enough that this particular sample didn’t show it. Some researchers would say “There is insufficient evidence to reject the hypothesis of independence.” Strictly speaking, that’s the same error. However, when the audience is researchers, rather than the non-technical public, it may be understood that they’re not really accepting H0, only failing to reject it pending the outcome of a further study. |
(1) |
H0: Age distribution of grand jurors matches age distribution of county.
H1: Age distribution of grand jurors does not match age distribution of county. |
---|---|
(2) | α = 0.05 |
(3/4) | The county percentages are the model and go in L1. The
numbers of jurors (not percentages) go in L2. Reminder: don’t
include the total row.
results: χ²=61.2656 → χ² = 61.27, df = 3, p-value = 3.2×10-13 or p < 0.0001 |
(RC) | Because you’re not generalizing, the random-sample rule and the under-10% rule don’t matter. You need only check that all expected counts are ≥ 5, and since the lowest is 10.56, the requirements are met. |
(5) | p < α. Reject H0 and accept H1. |
(6) |
At the 0.05 significance level, the age distribution of grand jurors is
different from the age distribution in the county.
Or, The age distribution of grand jurors is different from the age distribution in the county (p < 0.0001). Remark: There are a lot of reasons for this. Judges tend to be older and tend to prefer jurors closer to their own age. Also, older candidates are more likely to be retired, which means they are less likely to be exempt by reason of their occupation. |
(1) |
H0: Population size of chosen residence town is independent of
population size of town raised in.
H1: Population size of chosen residence town depends on population size of town raised in. |
---|---|
(2) | α = 0.05 |
(3/4) | Enter the 3×3 array in Matrix A. (Never enter the
totals in a 2-way table hypothesis test.) Use MATH200A part 7 or the
calculator’s χ²-Test menu selection.
results: df = 4, χ² = 35.74, p-value=3.271956E-7 → p-value = 0.000 000 3 or p-value < 0.0001 |
(RC) |
|
(5) | p < α. Reject H0 and accept H1. |
(6) |
At the 0.05 significance level, there is an association between the
size of town men choose to live in and the size of town they
grew up in.
Or, There is an association between the size of town men choose to live in and the size of town they grew up in (p < 0.0001). |
(1) | H0: The tested treatments with Echinacea make no
difference to the proportion who catch cold.
H1: The treatments do make a difference. … |
---|---|
(2) | α = 0.01 |
(3/4) |
![]() χ²-Test or MATH200A part 7.
Results: χ² = 4.74, df = 6, p-value = 0.5769 Common mistake: Never enter the totals in a two-way test. |
(RC) |
|
(5) | p > α. Fail to reject H0. |
(6) | At the 0.01 significance level, we can’t determine
whether Echinacea is effective against the common cold or not.
Or, We can’t determine whether Echinacea is effective against the common cold or not (p = 0.5769). Remark: Researchers might write something like “Echinacea made no significant difference to infection rates in our study” with the p-value or significance level. It’s understood that this does not prove Echinacea ineffective — this particular study fails to reach a conclusion. But as additional studies continue to find p > α, our confidence in the null hypothesis increases. |
Remark:
If you used MATH200A part 7, there’s some interesting
information in matrix C. The top left 7 rows and 2 columns are the
χ² contributions for each of the seven treatments and two
outcomes. All are all quite low, in light of the rule
of thumb that only numbers above 4 or so are significant, even at the
less stringent 0.05 level.
The last two rows are the total numbers and percentages of people who did and didn’t catch cold: 349 (87.5%) and 50 (12.5%). If Echinacea is ineffective, you’d expect to see about that same infection rate for each of the seven treatments. Sure enough, compute the rates from the rows of the data table, and you’ll find that they vary between 81% and 92%.
The third column is the total subjects in each of the seven treatments, and the overall total. Of course you were given those in the data table, but it’s always a good idea to use this information to check your data entry.
The fourth column is the percentage of subjects who were assigned to each of the seven treatments, totaling 100% of course.
Write your answer to each question. There’s no work to be shown. Don’t bother with a complete sentence if you can answer with a word, number, or phrase.
Common mistake: Binomial is a subtype of qualitative data so it’s not really a synonym. Discrete and continuous are subtypes of numeric data.
The Gambler’s Fallacy is believing that the die is somehow “due for a 6”. The Law of Large Numbers says that in the long run the proportion of 6’s will tend toward 1/6, but it doesn’t tell us anything at all about any particular roll.
Common mistake: You must specify which is population 1 and which is population 2.
Common mistake: The data type is binomial: a student is in trouble, or not. There are no means, so μ is incorrect in the hypotheses.
This is a binomial PD.
Remark: The significance level α is the level of risk of a Type I error that you can live with. If you can live with more risk, you can reach more conclusions.
Complementary events can’t happen at the same time and one or the other must happen. Example: rolling a die and getting an odd or an even. Complementary events are a subtype of disjoint events.
For a small- to moderate-sized set of numeric data, you might prefer a stemplot.
Remark: C is wrong because “model good” is H0. D is also wrong: every hypothesis test, without exception, compares a p-value to α. For E, df is number of cells minus 1. F is backward: in every hypothesis test you reject H0 when your sample is very unlikely to have occurred by random chance.
Remark: As stated, what you can prove depends partly on your H1. There are three things it could be:
Regardless of H1, if p-value>α your conclusion will be D or similar to it.
Common mistake: Conclusion A is impossible because it’s the null hypothesis and you never accept the null hypothesis.
Conclusion B is also impossible. Why? because “no more than” translates to ≤. But you can’t have ≤ in H1, and H1 is the only hypothesis that can be accepted (“proved”) in a hypothesis test.
Remark: A Type I error is a wrong result, but it is not necessarily the result of a mistake by the experimenter or statistician.
(b) The size is unknown, but certainly in the millions. You also could call it infinite, or uncountable. Common mistake: Don’t confuse size of population with size of sample. The population size is not the 487 from whom you got surveys, and it’s not the 321 churchgoers in your sample.
(c) The sample size n is the 321 churchgoers from whom you collected surveys. Yes, you collected 487 surveys in all, but you have to disregard the 166 that didn’t come from churchgoers, because they are not your target group. Common mistake: 227 isn’t the sample size either. It’s x, the number of successes within the sample.
(d) No. You want to know the attitudes of churchgoers, so it is correct sampling technique to include only churchgoers in your sample.
If you wanted to know about Americans in general, then it would be selection bias to include only churchgoers, since they are more likely than non-churchgoers to oppose teaching evolution in public schools.
Common mistake: Your answer will probably be worded differently from that, but be careful that it is a conditional probability: If H0 is true, then there’s a p-value chance of getting a sample this extreme or more so. The p-value isn’t the chance that H0 is true.
Remark: If you are at all shaky about this, review What Does the p-Value Mean?
binomcdf(100,.08,5)
(b) This is a binomial distribution, for exactly the same reasons.
MATH200A part 3, or binompdf(100,.08,5)
(c) The probability of success is
p = 0.08 on every trial, but you don’t have a fixed
number of trials. This is a geometric distribution.
geometpdf(.08,5)
Remark: There is no specific claim, so this is not a hypothesis test.
Caution: The percentages must add to 100%. Therefore you must have complete data on all categories to display a pie chart. Also, if multiple responses from one subject are allowed, then a pie chart isn’t suitable, and you should use some other presentation, such as a bar graph.
Remark: This problem tests for several very common mistakes by students. Always make sure that
This leaves you with G and K as possibilities. Either can be correct, depending on your textbook. The most common practice is always to put a plain = sign in H0 regardless of H1, which makes G the correct answer. But some textbooks or profs prefer ≤ or ≥ in H0 for one-tailed tests, whch makes K the correct answer.
Remark: The Z-Test is wrong because you don’t know the SD of the selling price of all 2006 Honda Civics in the US. The 1-PropZTest and χ²-test are for non-numeric data. There is no such thing as a 1-PropTTest.
Example: “812 of 1000 Americans surveyed said they believe in ghosts” is an example of descriptive statistics: the numbers of yeses and noes in the sample were counted. “78.8% to 83.6% of Americans believe in ghosts (95% confidence)” is an example of inferential statistics: sample data were used to make an estimate about the population. “More than 60% of Americans believe in ghosts” is another example of inferential statistics: sample data were used to test a claim and make a statement about a population.
Remark: Remember that the confidence interval derives from the central 95% or 90% of the normal distribution. The central 90% is obviously less wide than the central 95%, so the interval will be less wide.
Example: You want to know the average amount of money a full-time TC3 student spends on books in a semester. The population is all full-time TC3 students. You randomly select a group of students and ask each one how much s/he spent on books this semester. That group is your sample.
Remark: This is unpaired numeric data, Case 4.
(b) For binomial data, requirements are slightly different between CI and HT. Here you are doing a hypothesis test.
Common mistake: For hypothesis test, you need expected successes and failures. It’s incorrect to use actual successes (150) and failures (350).
Common mistake: Some students answer this question with “n > 30”. That’s true, but not relevant here. Sample size 30 is important for numeric data, not binomial data.
Common mistake: You cannot do a 2-SampZTest because you do not know the standard deviations of the two populations.
(1) |
Population 1 = Judge Judy’s decisions; Population 2 = Judge
Wapner’s decisions
H0: μ1 = μ2, no difference in awards H1: μ1 > μ2, Judge Judy gives higher awards |
---|---|
(2) | α = 0.05 |
(RC) |
|
(3–4) | 2-SampTTest: x̅1=650, s1=250, n1=32,
x̅2=580, s2=260, n2=32, μ1>μ2, Pooled: No
Results: t=1.10, p-value = .1383 |
(5) | p > α. Fail to reject H0. |
(6) | At the 0.05 level of significance, we can’t tell whether Judge Judy was more friendly to plaintiffs (average award higher than Judge Wapner’s) or not. |
Some instructors have you do a preliminary F-test. It gives p=0.9089>0.05, so after that test you would use Pooled:Yes in the 2-SampTTest and get p=0.1553.
Solution: This is one-population numeric data, and you don’t know the standard deviation of the population: Case 1. Put the data in L1, and 1-VarStats L1 tells that x̅ = 4.56, s = 1.34, n = 8.
(1) |
H0: μ = 4, 4% or less improvement in drying time
H1: μ > 4, better than 4% decrease in drying time Remark: Why is a decrease in drying time tested with > and not <? Because the data show the amount of decrease. If there is a decrease, the amount of decrease will be positive, and you are interested in whether the average decrease is greater than 4 (4%). |
---|---|
(2) | α = 0.05 |
(RC) |
(You don’t have to show these graphs on your exam paper; just show the numeric test for normality and mention that the modified boxplot shows no outliers.) |
(3–4) |
T-Test: μo=4, x̅=4.5625, s=1.34…, n=8, μ>μo
Results: t = 1.19, p = 0.1370 |
(5) | p > α. Fail to reject H0. |
(6) | At the 0.05 significance level, we can’t tell whether the average drying time improved by more than 4% or not. |
(b) TInterval: C-Level=.95
Results: (3.4418, 5.6832)
(There’s no need to repeat the requirements check or to write down all the sample statistics again.)
With 95% confidence, the true mean decrease in drying time is between 3.4% and 5.7%.
n = 5, p = 0.28, from = 0, to = 0. Answer: 0.1935
Alternative solution: If you don’t have the program, you can compute the probability that one rabbit has short hair (1−.28 = 0.72), then that all the rabbits have short hair (0.72^5 = 0.1935), which is the same as the probability that none of the rabbits have long hair.
(b) The complement of “one or more” is none, so you can use the previous answer.
P(one or more) = 1−P(none) = 1−0.1935 = 0.8065
Alternative solution: MATH200A part 3 with n=5, p=.28, from=1, to=5; probability = 0.8065
(c) Again, use MATH200A part 3 to compute binomial probability: n = 5, p = 0.28, from = 4, to = 5. Answer: 0.0238
Alternative solution: If you don’t have the program, do binompdf(5, .28) and store into L3, then sum(L3,5,6) or L3(5)+L3(6) = 0.0238. Avoid the dreaded off-by-one error! For x=4 and x=5 you want L3(5) and L3(6), not L3(4) and L3(5).
For n=5, P(x≥4) = 1−P(x≤3). So you can also compute the probability as 1−binomcdf(5, .28, 3) = 0.0238.
(d) For this problem you must know the formula:
μ = np = 5×0.28 = 1.4 per litter of 5, on average
Common mistake: It might be tempting to do this problem as a goodness-of-fit, Case 6, taking the Others row as the model and the doctors’ choices as the observed values. But that would be wrong. Both the Doctors row and the Others row are experimental data, and both have some sampling error around the true proportions. If you take the Others row as the model, you’re saying that the true proportions for all non-doctors are precisely the same as the proportions in this sample. That’s rather unlikely.
(1) |
H0: Doctors eat different breakfasts in the same proportions as others.
H1: Doctors eat different breakfasts in different proportions from others. |
---|---|
(2) | α = 0.05 |
(3–4) | χ²-Test gives χ² = 9.71, df = 4, p=0.0455 |
(RC) |
|
(5) | p < α. Reject H0 and accept H1. |
(6) | Yes, doctors do choose breakfast differently from other self-employed professionals, at the 0.05 significance level. |
(b) 70−67.6 = 2.4″, and therefore z = −1. By the Empirical Rule, 68% of data lie between z = ±1. Therefore 100−68 = 32% lie outside z = ±1 and 32%/2 = 16% lie below z = −1. Therefore 67.6″ is the 16th percentile.
Alternative solution: Use the big chart to add up the proportion of men below 67.6″ or below z = −1. That is 0.15+2.35+13.5 = 16%.
(c) z = (74.8−70)/2.4 = +2. By the Empirical Rule, 95% of men fall between z = −2 and z = +2, so 5% fall below z = −2 or above z = +2. Half of those, 2.5%, fall above z = +2, so 100−2.5 = 97.5% fall below z = +2. 97.5% of men are shorter than 74.8″.
Alternative solution: You could also use the big chart to find that P(z > 2) = 2.35+0.15 = 2.5%, and then P(z < 2) = 100−2.5 = 97.5%.
(b) Compute the class marks or midpoints: 575, 725, and so on. Put
them in L1 and the frequencies in L2. Use 1-VarStats L1,L2
and get n = 219.
See
Summary Numbers on the TI-83.
(c) Further data from 1-VarStats L1,L2
:
x̅ = 990.1 and
s = 167.3
Common mistake:
If you answered x̅ = 950 you probably did
1-VarStats L1
instead of 1-VarStats L1,L2
.
Your calculator depends on you to supply one list when you have a
simple list of numbers and two lists when you have a frequency
distribution.
(d) f/n = 29/219 ≈ 0.13 or 13%
invNorm(0.85, 57.6, 5.2) = 62.98945357 → 63.0 mph
MATH200A/sample size/binomial: p̂ = .2, E = 0.04, C-Level = 0.90
answer: 271.
Common mistake: The margin of error is E = 4% = 0.04, not 0.4.
Alternative solution:
See
Sample Size by Formula
and use the
formula at right.
With the estimated population proportion p̂ = 0.2 in
the formula, you get zα/2 =
z0.05 = invNorm(1−0.05) = 1.6449, and
n = 270.5543 → 271
(b) If you have no prior estimate, use p̂ = 0.5. The other inputs are the same, and the answer is 423
You expect positive correlation because points trend upward to the right (or, because y tends to increase as x increases). Even before plotting, you could probably predict a positive correlation because you assume higher calories come from fat; but you can’t just assume that without running the numbers.
(b) See Step 2 of
Scatterplot, Correlation, and Regression on TI-83/84.
r = .8863314629 → r = 0.8862
a = .0586751909 → a = 0.0587
b = −3.440073602 → b = −3.4401
ŷ = 0.0587x − 3.4401
Common mistake: The symbol is ŷ, not y.
(c) The y intercept is −3.4401. It is the number of grams of fat you expect in the average zero-calorie serving of fast food. Clearly this is not a meaningful concept.
Remark: Remember that you can’t trust the regression outside the neighborhood of the data points. Here x varies from 130 to 640. The y intercept occurs at x = 0. That is pretty far outside the neighborhood of the data points, so it’s not surprising that its value is absurd.
(d) See How to Find ŷ from a Regression on TI-83/84. Trace at x = 310 and read off ŷ = 14.749… ≈ 14.7 grams fat. This is different from the actual data point (x=310, y=25) because ŷ is based on a trend reflecting all the data. It predicts the average fat content for all 310-calorie fast-food items.
Alternative solution: ŷ = .0586751909(310) − 3.440073602 = 14.749 ≈ 14.7.
(e) The residual at any (x,y) is y−ŷ. At x = 310, y = 25 and ŷ = 14.7 from the previous part. The residual is y−ŷ = 10.3
Remark: If there were multiple data points at x = 310, you would calculate one residual for each point.
(f) From the LinReg(ax+b)
output,
R² = 0.7855834621 →
R² = 0.7856
About 79% of the variation in fat content is associated with variation in calorie content.
The other 21% comes from lurking
variables such as protein and carbohydrate count and from sampling
error.
(g) See Decision Points for Correlation Coefficient. Since 0.8862 is positive and 0.8862 > 0.602, you can say that there is some positive correlation in the population, and higher-calorie fast foods do tend to be higher in fat.
(1) |
d = After − Before
H0: μd = 0, no improvement H1: μd > 0, improvement in number of sit-ups Remark: Why After−Before instead of the other way round? Since we expect After to be greater than Before, doing it this way you can expect the d’s to be mostly positive (if H1 is true). Also, it feels more natural to set things up so that an improvement is a positive number. But if you do d=Before−After and H1:μd<0, you get the same p-value. |
---|---|
(2) | α = 0.01 |
(RC) |
The plots are shown here for comparison to yours, but you don’t need to copy these plots to an exam paper.
|
(3–4) |
T-Test: μo=0, List:L4, Freq:1, μ>μo
Results: t = 2.74, p = 0.0169, x̅ = 4.4, s = 4.3, n = 7 |
(5) | p > α. Fail to reject H0. |
(6) | At the 0.01 significance level, we can’t say whether the physical fitness course improves people’s ability to do sit-ups or not. |
(b) normalcdf(-10^99, 24, 27, 4/√5) = .0467662315 → 0.0468 or about a 5% chance
(1) |
H0: Nebraska preferences are the same as national proportions.
H1: Nebraska preferences are different from national proportions. |
---|---|
(2) | α = 0.05 |
(3–4) | US percentages in L1, Nebraska observed counts in
L2. MATH200A part 6.
The result is χ² = 12.0093 → 12.01, df = 4, p-value = 0.0173 Common mistake: Some students convert the Nebraska numbers to percentages and perform a χ² test that way. The χ² test model can equally well be percentages or whole numbers, but the observed numbers must be actual counts. |
(RC) |
|
(5) | p < α. Reject H0 and accept H1. |
(6) | Yes, at the 0.05 significance level Nebraska preferences in vacation homes are different from those for the US as a whole. |
(1) |
Population 1 = Course, Population 2 = No course
H0: μ1 = μ2, no benefit from diabetic course H1: μ1 < μ2, reduced blood sugar from diabetic course |
---|---|
(2) | α = 0.01 |
(RC) | Independent random samples, both n’s >30 |
(3–4) |
2-SampTTest: x̅1=6.5, s1=.7, n1=50, x̅2=7.1, s2=.9, n2=50,
μ1<μ2, Pooled:No
Results: t=−3.72, p=1.7E−4 or 0.0002 Though we do not, some classes use the preliminary 2-SampFTest. That test gives p=0.0816>0.05. Those classes would use Pooled:Yes in 2-SampTTest and get p=0.00016551 and the same conclusion. |
(5) | p < α. Reject H0 and accept H1. |
(6) | At the 0.01 level of significance, the course in diabetic self-care does lower patients’ blood sugar, on average. |
(b) For two-population numeric data, paired data do a good job of controlling for lurking variables. You would test each person’s blood sugar, then enroll all thirty patients in the course and test their blood sugar six months after the end of the course. Your variable d is blood sugar after the course minus blood sugar before, and your H1 is μd < 0.
One potential problem is that all 30 patients receive a heightened level of attention, so you have to worry about the placebo effect. (With the original experiment, the control group did not receive the extra attention of being in the course, so any difference from the attention is accounted for in the different results between control group and treatment group.)
It seems unlikely that the placebo effect would linger for six months after the end of a short course, but you can’t rule out the possibility. There are two answers to that. You could re-test the patients after a year, or two years. Or, you could ask whether it really matters why patients do better. If they do better because of the course itself, or because of the attention, either way they’re doing better. A short course is relatively inexpensive. If it works, why look a gift horse in the mouth? In fact, medicine is beginning to take advantage of the placebo effect in some treatments.
(1) |
H0: μ = 2.5 years
H1: μ > 2.5 years |
---|---|
(2) | α = 0.05 |
(RC) | random sample, normal with no outliers (given) |
(3–4) |
T-Test: μo=2.5, x̅=3, s=.5, n=6, μ>μo
Results: t = 2.45, p = 0.0290 |
(5) | p < α. Reject H0 and accept H1. |
(6) | Yes, at the 0.05 significance level, the mean duration of pain for all persons with the condition is greater than 2.5 years. |
(1) |
Population 1 = men, Population 2 = women
H0: p1 = p2 men and women equally likely to refuse promotions H1: p1 > p2 men more likely to refuse promotions |
---|---|
(2) | α = 0.05 |
(RC) |
|
(3–4) |
2-PropZTest: x1=60, n1=200, x2=48, n2=200, p1>p2
Results: z=1.351474757 → z = 1.35, p=.0882717604 → p-value = .0883, p̂1=.3, p̂2=.24, p̂=.27 |
(5) | p > α. Fail to reject H0. |
(6) | At the 0.05 level of significance, we can’t determine whether the percentage of men who have refused promotions to spend time with their family is more than, the same as, or less than the percentage of women. |
(b) 2-PropZInt with the above inputs and C-Level=.95 gives (−.0268, .14682). The English sentence needs to state both magnitude and direction, something like this: Regarding men and women who refused promotion for family reasons, we’re 95% confident that men were between 2.7 percentage points less likely than women, and 14.7 percentage points more likely.
Common mistake: With two-population confidence intervals, you must state the direction of the difference, not just the size of the difference.
If the middle 95% runs from 70 to 130, then the mean must be μ = (70+130)÷2 → μ = 100
95% of any population are within 2 standard deviations of the mean. The range 70 to 100 (or 100 to 130) is therefore two SD. 2σ = 100−70 = 30 → σ = 15
(1) |
H0: p = .75
H1: p < .75 |
---|---|
(2) | α = 0.05 |
(RC) |
|
(3–4) |
1-PropZTest: po=.75, x=40, n=65, prop<po
Results: z=−2.506402059 → z = −2.51, p=.006098358 → p-value = 0.0061, p̂=.6154 |
(5) | p < α. Reject H0 and accept H1. |
(6) | At the 0.05 level of significance, less than 75% of claims do settle within 2 months. |
P(Brand A and mislabeled) = P(Brand A) × P(mislabeled | Brand A)
and similarly for brand B.
P(mislabeled) = 0.40 × 0.025 + 0.60 × 0.015 = 0.019 or just under 2%
Alternative solution: The formulas can be confusing, and often there’s a way to do without them. You could also do this as a matter of proportions:
Out of 1000 shoes, 400 are Brand A and 600 are Brand B.
Out of 400 Brand A shoes, 2.5% are mislabeled. 0.025×400 = 10 brand A shoes mislabeled.
Out of 600 Brand B shoes, 1.5% are mislabeled. 0.015×600 = 9 brand B shoes mislabeled.
Out of 1000 shoes, 10 + 9 = 19 are mislabeled. 19/1000 is 1.9% or 0.019.
This is even easier to do if you set up a two-way table, as shown below. The values in bold face are given in the problem, and those in light face are derived from them.
Brand A | Brand B | Total | |
---|---|---|---|
Mislabeled | 40% × 2.5% = 1% | 60% × 1.5% = 0.9% | 1% + 0.9% = 1.9% |
Correctly labeled | 40% − 1% = 39% | 60% − 0.9% = 59.1% | 39% + 59.1% = 98.1% |
Total | 40% | 60% | 100% |
Solution: This is paired numeric data, Case 3.
Common mistake: You must do this as paired data. Doing it as unpaired data will not give the correct p-value.
(1) |
d = A−B
H0: μd = 0, no difference in smoothness H1: μd ≠ 0, a difference in smoothness Remark: You must define d as part of your hypotheses. |
---|---|
(2) | α = 0.10 |
(RC) |
|
(3–4) |
T-Test: μo=0, List:L3, Freq: 1, μ≠μo
Results: t = 1.73, p = 0.1173, x̅ = 1, s = 1.83, n = 10 |
(5) | p > α. Fail to reject H0. |
(6) | At the 0.10 level of significance, it’s impossible to say whether the two brands of razors give equally smooth shaves or not. |
Solution: (a) Use MATH200A part 3 with n=2, p=0.9, from=1, to=1. Answer: 0.18
You could also use binompdf(2, .9, 1) = 0.18.
Alternative solution: The probability that exactly one is tainted is sum of two probabilities: (i) that the first is tainted and the second is not, and (ii) that the first is not tainted and the second is. Symbolically,
P(exactly one) = P(first and secondC) + P(firstC and second)
P(exactly one) = 0.9×0.1 + 0.1×0.9
P(exactly one) = 0.09 + 0.09 = 0.18
Solution: (b) When sampling without replacement, the probabilities change. You have the same two scenarios — first but not second, and not first but second — but the numbers are different.
P(exactly one) = P(first and secondC) + P(firstC and second)
P(exactly one) = (9/10)×(1/9) + (1/10)×(9/9)
P(exactly one) = 1/10 + 1/10 = 2/10 = 0.2
Common mistake: Many, many students forget that both possible orders have to be considered: first but not second, and second but not first.
Common mistake: You can’t use binomial distribution in part (b), because when sampling without replacement the probability changes from one trial to the next.
For example, if the first card is an ace then the probability the second card is also an ace is 3/51, but if the first card is not an ace then the probability that the second card is an ace is 4/51. Symbolically, P(A2|A1) = 3/51 but P(A2| not A1) = 4/51.
(a) p̂T = 128/300 = 0.4267. p̂C = 135/400 = 0.3375. p̂T−p̂C = 0.0892 or about 8.9%
Remark: The point estimate is descriptive statistics, and requirements don’t enter into it. But the confidence interval is inferential statistics, so you must verify that each sample is random, each sample has at least 10 successes and 10 failures, and each sample is less than 10% of the population it came from.
The problem states that the samples were random, which takes care of the first requirement. There were 128 successes and 300−128 = 172 failures in Tompkins, 135 successes and 400−135 = 265 failures in Cortland, so the second reqirement is met.
What about the third requirement? You don’t know the populations of the counties, but remember that you can work it backwards. 10×300 = 3000 (Tompkins) and 10×400 = 4000 (Cortland), and surely the two counties must have populations greater than 3000 and 4000, so the third requirement must be met.
(b) 2-PropZInt: The 98% confidence interval is 0.0029 to 0.1754 (about 0.3% to 17.5%), meaning that with 98% confidence Tompkins viewers are more likely than Cortland viewers, by 0.3 to 17.5 percentage points, to prefer a movie over TV.
(c) E = 0.1754−0.0892 = 0.0862 or about 8.6%
You could also compute it as 0.0892−0.0029 = 0.0863 or (0.1754−0.0029)/2 = 0.0853. All three methods get the same answer except for a rounding difference.
(1) |
Population 1 = no treatment, Population 2 = special treatment
H0 p1 = p2, no difference in germination rates H1 p1 ≠ p2, there’s a difference in germination rates |
---|---|
(2) | α = 0.05 |
(RC) |
|
(3–4) |
2-PropZTest: x1=80, n1=80+20, x2=135, n2=135+15,
p1≠p2
Results: z = −2.23, p-value = 0.0256, p̂1 = .8, p̂2 = .9, p̂ = .86 |
(5) | p < α. Reject H0 and accept H1. |
(6) |
Yes, at the 0.05 significance level, the special treatment made a
difference in germination rate.
Specifically, seeds with the special treatment were more likely to
germinate than seeds that were not treated.
Remark: p < α in Two-Tailed Test: What Does It Tell You? explains how you can reach a one-tailed result from a two-tailed test. |
Alternative solution: You could also do this as a test of homogeneity, Case 7. The χ²-Test gives χ² = 4.98, df = 1, p=0.0256
Updates and new info: https://BrownMath.com/swt/