BrownMath.com → Stats w/o Tears → Ch 8 Solutions

# Stats without TearsSolutions for Chapter 8

Updated 29 Oct 2020

View or
Print:
These pages change automatically for your screen or printer. Underlined text, printed URLs, and the table of contents become live links on screen; and you can use your browser’s commands to change the size of the text or search for key words. If you print, I suggest black-and-white, two-sided printing.
Because this textbook helps you,
Because this textbook helps you,
BrownMath.com/donate.
1 This is numeric data. You have a random sample, and it’s less than 10% of the households in a country. Despite the skew, with sample size so far above 30 you can be sure that the shape of the sampling distribution is approximately normal. The mean of the sampling distribution is μ = μ = \$48,000 The SD of the sampling distribution of the mean, a/k/a standard error of the mean, is σ = σ/√n = \$2000/√64 → σ = \$250
2 (a) First, describe the distribution and sketch the situation. For the population, you’re given μ = 800, σ = 50, n = 100.
• Center: The mean of the sampling distribution is the same as the mean of the population, 800 hours.
• Spread: The standard error of the mean is σ = σ/√n = 50/√100 = 5 hours.
• Shape: You have a random sample, 10n = 10×100 = 1000 is certainly less than the total number of light bulbs, and your sample size is comfortably larger than 30. Therefore you can use the normal model for the sampling distribution.

Sample means are ND with mean 800 hours and SD 5 hours. The sketch is at right.

Common mistake: The correct standard deviation is 5 hours, not 50. You’re not sketching the population of light bulbs. Rather, you’re now interested in the distribution of average lifetimes in samples of 100 bulbs. (The axis is the axis, not the x axis.)

780 hours, the sample mean that the problem asks about, is 20 hours below the population mean of 800. 20/5 = 4 standard errors, so you should have marked 780 hours at four standard deviations below the mean.

A sample mean of 780 is less than the population mean of 800 hours. Therefore you compute the probability of a sample mean of 780 hours or less. It will be surprising (unusual, unexpected) if the probability is under 5%.

P( ≤ 780) = normalcdf(−10^99, 780, 800, 50/√(100)) = 3.1686E-5 → P( ≤ 780) = 0.00003

(You can also give the probability as <0.0001.) Yes, this is surprising.

Common mistake: Don’t give the probability as 3.1686. Probabilities are never greater than 1.

(b) If the manufacturer’s claim is true, there are only three chances in a hundred thousand of getting a sample mean this low. It’s very unlikely that the manufacturer’s claim is true.

3 (a) “Describe the distribution” means shape, center and spread. You can always get center and spread, but if the test for normal approximation fails then you can’t say anything about the shape.
• μ = p = 0.72
• σ or SEP = √pq/n = √.72×(1−.72)/500 = 0.0200798406
• Expected “yes” per sample: np = 500×.72 = 360; expected “no” = 500−360 = 140; both are well above 10. You have a random sample, and 10×500 = 5000 is far less than the American population. Therefore the normal approximation is valid.

Answer: normally distributed with mean = 0.72, standard deviation (standard error) = 0.020

Common mistake: Don’t write n≥30 when testing the normal approximation. The n≥30 test applies to numeric data, but in this problem you have binomial data.

(b) 350/500 = 0.70 exactly, and 370/500 = 0.74 exactly. In a sample of 500, finding 350 to 374 successes is the same as finding 70% to 74% successes.

If you stored the computed SEP in part (a), then your screen will look like the one on the left. Otherwise, it will look like the one on the right: or Answer: P(70% ≤  ≤ 74%) = 0.6808.

Remark: Always check for reasonableness. 70% and 74% are one standard error below and above the mean, so you know from the Empirical Rule that about 68% of the data should be within that region.

Remark: The problem wanted you to use the normal approximation, but it’s always good to check answers by a different method if possible. 70%×500 = 350; 74%×500 = 370. MATH200A part 3 with n=500, p=.72, from 350 to 370, gives a probability of 0.7044, pretty good agreement.

4 The sampling distribution of is ND because the sample size of 1000 is greater than 30 and the random sample is smaller than 10% of the population (10% of 100,000 households is 10,000 households). The SEM is σ = 19000/√1000 ≈ \$601.

P( ≤ \$31,000) = normalcdf(−10^99, 31000, 32400, 19000/√(1000)) = 0.0099, almost exactly 1%. That would be pretty unlikely if the population mean was still \$32,400, so the city manager is most likely correct.

Remark: This problem was adapted from Freedman, Pisani, Purves (2007, 415) [see “Sources Used” at end of book].

5
x P(x) +10 18/38 −10 20/38 n/a 38/38 = 1

(a) The model is at right. You could list green and black separately, but since they have the same outcome there’s no need to do that. It’s important to have the probabilities as exact fractions, not approximate decimals. (b) x’s in L1, P’s in L2. `1-VarStats L1,L2` gives μ = −\$0.53, σ = \$9.99. Interpretation: In the long run, a player who bets \$10 on red will lose an average of 53¢ per bet.

Remark: Notice that the SD is about 20 times the mean. This is why gambling is so exciting for the player: there’s a lot of variability from one bet to the next.

(c) With n = 10,000, the sampling distribution of is normally distributed. (10n = 10×10,000 = 100,000, less than the total number of bets while the casino is in business. The bets placed in a given day are not random, but they are representative of all possible bets and therefore effectively random.) The mean of the sampling distribution is the mean of the population: μ = +\$0.53. (Whatever players lose, the casino wins, so the mean is the opposite of a player’s mean.) The standard error of the mean is σ/√n = 9.986139979/√10000; σ ≈ \$0.10.

Remark: This is why gambling is predictable for the operators: the SD is small compared to the mean.

(d) 10,000×\$.5263157895 = \$5,263.16 (e) To lose money, the casino has to make less than \$0.00. Zero is more than five standard errors below the mean (has a z-score below −5), so you know right off that it would be unusual for the casino to lose money. `normalcdf` confirms that: P(lose on 10,000 bets) = 6.8×10-8. The casino has essentially no risk (7 chances in 100 million) of losing money on 10,000 bets. (f) Remember the elevator example. A total of \$2000 on 10,000 bets is an average of 2000/10,000 = \$0.20 per bet. Use `normalcdf` to compute the probability of doing that well or better: P(make ≥\$2000) = 0.9995. Not only is the casino virtually certain not to lose money, it’s almost certain to make a handsome profit, as long as people come in to place bets.

6 Given: μ = 5.00, σ = 0.05, n = 15. Needed: P(∑x>75.6). A sample weighing 75.6 lb total will have a sample mean of 75.6/15 = 5.04 lb, so this is really just another problem in finding the probability of turning up a sample mean in a given range.
• μ = μ = 5.00 lb
• The SEM is σ = 0.05/√15 ≈ 0.013 lb.
• The sample means are normally distributed, even for this small sample, because the original population is normally distributed.

P(∑x > 75.6) = P( > 5.04) = normalcdf(75.6/15, 10^99, 5.00, 0.05/√(15)) = 9.7295E-4 ≈ 0.0010, about one chance in a thousand.

7 (a) This part is a standard Chapter 7 problem about individuals, not samples, so the axis is x rather than .  Answer: P(x > 43.0) = 0.1634 (b) The sampling distribution of is ND, even for this small sample, because the population is ND. The standard error is σ = 5.1/√14 ≈ 1.4.

P( > 43.0) = normalcdf(43, 10^99, 38, 5.1/√(14))  1.2212E-4 → P(>43.0) = 0.0001 or 0.01%

Remark: This sketch is not very well proportioned, because it makes the probability look much larger than it actually is.

8 12,778 KW shared among 1000 households is 12778/1000 = 12.778 KW per household on average. “Fail to supply enough power” means that the households are using more power than that. You need P( > 12.778) for n = 1000.

The standard error of the mean is σ = 3.5/√1000, about 0.11. The sampling distribution of the mean is normal because data are numeric and n =1000, greater than 30. (Treat the sample as random because it’s a “typical neighborhood”. And a thousand households is less than 10% of all the households that there are.)

P( > 12.778) = normalcdf(12.778, 10^99, 12.5, 3.5/√(1000) = 0.0060

9 p = 0.0171, n = 11,037, and you want to find P( ≤ 0.0094). First check that the sampling distribution of is a ND:
• The doctors were randomized between treatment and placebo groups.
• 10×11,037 = 110,370. There are more adult males than that.
• np = 11037×.0171 = about 189; nq = 11037−189 = 10848. Both are well above 10.

Therefore the sampling distribution can be approximated by a normal distribution.

The standard error of the proportion or SEP is σ = √pq/n = √.0171(1−.0171)/11037 ≈ 0.0012

If you use my shortcut, your screen will look like the one at the left; if not, it will look like the one at the right. or Either way, the probability is 2.2013×10-10, or 0.000 000 000 2. There are only two chances in ten billion of getting a sample proportion of 0.94% or less with sample size 11,037, if the true population proportion is 1.71%. That’s pretty darn unlikely, so based on this experiment you can rule out coincidence and decide that aspirin does reduce the chance of a heart attack among adult males.

10 Heights are ND, so the sampling distribution is also. By the Empirical Rule or 68–95–99.7 Rule, 95% of a ND falls within 2 SD of the mean. The distribution that concerns you in this problem is the sampling distribution of , not the original distribution of individual men’s heights. Therefore, the SD that concerns you is the standard error of the mean, not the SD of men’s heights.

The standard error of the mean or SEM is σ = σ/√n = 2.92/√16 = 0.73″.

μ ± 2σ = 69.3 ± 2×.73 = 67.84 to 70.76.

Sample means between those values would not be surprising, and therefore a sample mean would be surprising if it is under 67.84″ or over 70.76″.

Alternative solution: That back-of-the-envelope calculation is good enough, but you could also get a more precise answer:

L = invNorm(0.025, 69.3, 2.92/√(16)) = 67.87

H = invNorm(1−0.025, 69.3, 2.92/√(16)) = 70.73

11 This is like the Swain v. Alabama example. You have to convert the sample counts into a proportion:  = 737/1504 ≈ 49%. The problem is really asking you for P( ≥ 49%) in a sample of 1504 with population proportion of 45%. What does the sampling distribution look like? The center is μ = p = 0.45. The standard error is σ = √0.45×(1−.45)/1504 ≈ 0.013. Check requirements to make sure that a normal model can be used for the sampling distribution:

• Random sample? Yes, given.
• Sample less than 10% of population? 10×1504 = 15,040, compared to millions of American adults, OK.
• Sample large enough? Yes, 0.45×1504 ≈ 677 successes and 1504−677 ≈ 827 failures expected, both above 10.

P(x ≥ 737) = P( ≥ 49%) = normalcdf(737/1504, 10^99, .45, √(.45*(1−.45)/1504)) ≈ 9E-4 or 0.0009.

Can you draw a conclusion? Yes, you can. In a population with 45% unfavorable rating of the Tea Party, there are only 9 chances in 10,000 of getting a sample as unfavorable as this one (or more unfavorable). That’s pretty unlikely, so you conclude that the true unfavorable rating in October was most likely more than 45% of all Americans. (In Chapter 9, you’ll learn how to estimate that proportion from a sample.)

## What’s New

• 29 Oct 2020: Converted document from HTML 4.01 to HTML5, and improved the formatting of radicals.
• 1 Apr 2015: Modified the exercise on belief in angels to include converting counts to proportions. Added an alternative solution, suggested by Marianna Grigorov, to the exercise on surprising heights.
• (intervening changes suppressed)
• 24 Mar 2013: New document.
Because this textbook helps you,