BrownMath.com → Stats w/o Tears → Review

# Stats without TearsReview

Updated 9 Dec 2015

View or
Print:
These pages change automatically for your screen or printer. Underlined text, printed URLs, and the table of contents become live links on screen; and you can use your browser’s commands to change the size of the text or search for key words. If you print, I suggest black-and-white, two-sided printing.

Summary: Even if you’ve been doing all the work and keeping up with the course, the mass of material you need to know for the exam can be overwhelming. This page helps you identify what’s most important in preparing for the exam. (If you’re an independent learner, it points you to the most important things you should have learned from your study.)

## What’s Important?

Here are your guidelines for reviewing the subject matter.

### Do This for Every Chapter

• Read the Summary at the beginning, when there is one.
• Notice when a section is marked optional. When you’re doing your final studying, spend your time on the core concepts, not the optional extras.
• Scroll through the chapter and look at the definitions. Do you understand the meaning of each term and how to use it?
• Scroll through the chapter again, and this time look at section heads and key concepts, which are marked in bold. Make notes for your cheat sheet of anything important that you think you might forget.

Pay attention also to calculator procedures and formulas. Know when and how to carry out each calculator procedure, and when and how to use the very few formulas that aren’t built into your calculator or the MATH200A program. There’s at least one example for each calculator procedure and each formula, so work through it if you need to refresh your memory. Again, make notes for your cheat sheet of anything you’re likely to forget.

• The “What Have You Learned?” section at the end of the chapter lists the most important concepts. (Links to “What Have You Learned?” are below.) in the online version of this document.)

If you’ve actually learned everything listed there, you should be in good shape for the exam. If you haven’t, review that section of the text and work the examples.

• Glance over the chapter exercises. If you had trouble with any of them before, make sure you thoroughly understand it now.

### Finish with Overall Course Review

So much for the trees. Now it’s time to think “forest”.

• Go through the cheat sheets you just made, and boil them down to one sheet, front and back. Making a one-sheet cheat sheet is always useful, even if the exam will be open book or if the instructor doesn’t allow any notes at all. Writing your summary of the course helps you make sense of the course material, to see it as a whole instead of an unrelated jumble of facts to memorize.
• Practice with the review problems below. If you can’t work a problem, go back and learn what you’re missing. If something is missing from your cheat sheet, add it.
• Get a good night’s rest the night before the exam. Sleep deprivation makes people make stupid mistakes, so protect yourself from that.
Because this textbook helps you,
Because this textbook helps you,
BrownMath.com/donate.

## Review Problems

Here are practice problems to help you test your knowledge and prepare for the final exam. Solutions are provided, but make a genuine effort to work any given problem on your own before you turn to the solution.

How to use: Don’t necessarily make it your goal to work every problem. But do at least look at every one and make sure that you can set it up correctly. Your success on the final exam hinges on your ability to identify which type of problem you are facing.

Don’t panic! This problem set is much longer than the exam will be, and some problems are harder than the problems you will meet on the exam.

### Problem Set 1: Short Answers

Write your answer to each question. There’s no work to be shown. Don’t bother with a complete sentence if you can answer with a word, number, or phrase.

1 Two events A and B are disjoint. Is it possible for those same events to be independent as well? Give an example, or explain why it’s impossible.
2 Yummo candy bars are supposed to have an average weight of 87.5 grams (about three ounces). To test this, a team of students bought one Yummo bar from each of the six stores in the village of Carlyle and weighed it.

(a) The data would best be analyzed as an example of
A. one population proportion
B. two populations, difference in proportions
C. one population mean
D. two populations, difference in means, paired data
E. two populations, difference in means, unpaired data
F. goodness of fit
G. contingency table

(b) Which two tests must you perform on your sample data before doing the analysis mentioned above? (In other words, how would you make sure that the sample meets the requirements?)

3 The two main types of data are qualitative and quantitative. What other names can you give for each? Give an example of each.
4 The probability of rolling a 6 on an honest die is 1/6. If you roll an honest die ten times and none of the rolls comes up 6, is the probability of rolling a 6 on the next roll less than 1/6, equal to 1/6, or greater than 1/6? Explain why.
5 In a large elementary school, you select two age-matched groups of students. Group 1 follows the normal schedule. Group 2 (with parents’ permission) spends 30 minutes a day learning to play a musical instrument. You want to show that learning a musical instrument makes a student less likely to get into trouble. You consider a student in trouble if s/he was sent to the principal’s office at any time during the year.
(a) Write your hypotheses, in symbols.
(b) Identify either the case number or the specific TI-83 test you would use.
6 Imagine rolling five standard dice. You compute the probability of rolling no 3s, one 3, and so on up to five 3s. Is this a binomial probability distribution? With reference to the definition of a binomial PD, why or why not?
7 Over the course of many statistical experiments, which one of these values for the significance level would enable you to prove the most results?
A. 5%        B. 1%        C. 0.1%        D. Significance level has no effect on how likely you are to prove a hypothesis.
8 A key step in hypothesis testing is computing a p-value and comparing it to your preselected α. After you do that, which of the following conclusions would be possible, depending on the specific values of p and α? (Write the letter of each correct answer; there may be more than one.)
A. Accept H0, reject H1
B. Reject H0, accept H1
C. Fail to accept H0, no conclusion
D. Fail to reject H0, no conclusion
9 Distinguish disjoint events, mutually exclusive events, and complementary events. Give an example of each.
10 When is a histogram an appropriate graphical method of presentation?
11 For what type of events does P(A or B) = P(A) + P(B)? Give an example.
12 In a χ² goodness-of-fit test, which of the following is/are true?
(A question with this many technical alternatives will not be on the exam. Just use it to test your own understanding of χ².)
A. The hypotheses are stated in words rather than relating some population parameter to a number.
B. The null hypothesis is always some variation on “the observed sample matches the model.”
C. The alternative hypothesis is always some variation on “our model is good.”
D. Instead of a p-value, we compare the value of χ² to α to draw a conclusion.
E. Degrees of freedom equals the number of cells in our model.
F. If the difference between our observed results and our expected results could likely have occurred by random chance, we reject the null hypothesis.
13 What are the two types of numeric data called? Explain the difference, and give an example of each.
14 Suppose the null hypothesis is that a machine is producing the allowed 1% proportion of defectives (H0: p = 0.01). Your experiment could end in one of several conclusions, depending on your sample data. List the letters of all possible conclusions from those below. (The actual conclusion would depend on the choice of H1, the choice of α, and the calculated p-value. Not all possible conclusions are listed below.)
A. The machine is producing exactly the acceptable proportion of defectives.
B. The machine is producing no more defectives than acceptable.
C. The machine is producing too many defectives.
D. Unable to prove anything either way.
15 How can you avoid making a Type I error in a hypothesis test?
16 You want to find what proportion of churchgoers believe that evolution should be taught in public schools, so you take a systematic survey at a local mall. You collect 487 survey forms. Of those, 321 identify as churchgoers, and 227 of those 321 say that evolution should be taught in public schools.
(a) What is the population?
(b) What is the population size?
(c) What is the sample size?
(d) Is limiting the sample to churchgoers a bias source?
17 You’re doing a hypothesis test to try to show that Drug A is more effective than Drug B, and your p-value is 0.0678. Your roommate, who has not taken statistics, asks, “So there’s a 6.78% chance that the drugs are equally effective, right?” Explain what the p-value actually means.
18 Eight percent of the 2×4s from a lumber yard have cracks longer than an inch. Assume that the defectives are randomly distributed. Do you use a binomial or a geometric distribution to compute each of the following, and why? (You don’t actually need to compute the probabilities; just identify the distributions.)
(a) The probability that no more than five 2×4s in a random sample of 100 have cracks longer than an inch.
(b) The probability that exactly five 2×4s in a random sample of 100 have cracks longer than an inch.
(c) The probability that, pulling 2×4s at random, the first four don’t have cracks longer than an inch but the fifth one does.
19 Data are gathered and a computation is done to answer the question “As near as we can tell, how much does the average high-school student spend on lunch?” This computation would be part of
A. hypothesis test
B. sample size
C. confidence interval
D. none of the above
20 Linear correlation coefficients must lie between what two values? What value indicates “no linear correlation”? Does this mean no correlation at all?
21 “Four out of five dentists surveyed recommend Trident sugarless gum for their patients who chew gum.” Which of these is the correct symbol for “four out of five dentists surveyed”?
μ      π      σ      p           po      x           s
22 A poll concludes that 26.9% of TC3 students are satisfied with the food service. What is the type of the original data gathered?
23 For what sort of data might you use a pie chart? Why?
24 The mean is usually the best measure of center of numerical data. But under certain circumstances the mean is not representative and you prefer a different measure of center. Which circumstances, and which measure of center?
25 Usually you make what you want to prove the alternative hypothesis, not the null hypothesis. Why?
26 A company wishes to claim, “People who eat our shredded wheat for breakfast every day for a month lose more than ten points on their cholesterol.” One or more of the following state the null and alternative hypotheses correctly. Which one(s)?
 A. H0 > 10       H1 ≤ 10 B. H0: x̅ > 10     H1: x̅ ≤ 10 C. H0: μ > 10     H1: μ ≤ 10 D. H0: x > 10     H1: x ≤ 10 E. H0 = 10       H1 > 10 F. H0: x̅ = 10     H1: x̅ > 10 G. H0: μ = 10     H1: μ > 10 H. H0: x = 10     H1: x > 10 I. H0 ≤ 10       H1 > 10 J. H0: x̅ ≤ 10     H1: x̅ > 10 K. H0: μ ≤ 10     H1: μ > 10 L. H0: x ≤ 10     H1: x > 10
27 Which of the following is a Type I error?
A. failing to reject the null hypothesis when it is true
B. failing to reject the null hypothesis when it is false
C. rejecting the null hypothesis when it is true
D. rejecting the null hypothesis when it is false
28 Compare an experiment and an observational study.
29 Our symbol for level of confidence in a confidence interval is
α        α/2        1–α        z(α/2)        E
(If none of these, supply the correct symbol.)
30 You gather a random sample of selling prices of 2006 Honda Civics. Which selection on your TI-83 would be used to test the claim “In the US, 2006 Honda Civics sell, on average, for more than \$2,000”?
A. Z-Test     B. T-Test     C. 1-PropZTest     D. 1-PropTTest     E. χ²-Test     F. none of these
31 Compare descriptive and inferential statistics, and give an example of each.
32 You find that your maximum error of estimate (margin of error) is ±3.3 at a confidence level of 95%. At 90% confidence, what would be the maximum error of estimate?
A. more than 3.3         B. 3.3         C. less than 3.3         D. can’t say without more information.
33 Compare “sample” and “population”; give an example.
34 You take a random sample of Lamborghini owners and a random sample of Subaru owners. Which selection on your TI-83 would be used to answer the question “How much more do Lamborghini owners spend per year on maintenance than Subaru owners?”
A. ZInterval     B. TInterval     C. 2-SampZInt     D. 2-SampTInt     E. 2-PropZInt     F. none of these
35 You believe that more than 25% of high-school students experienced strong peer pressure to have sex. To test this belief, you survey 500 randomly selected graduating seniors nationwide and find that 150 of them say that they did feel such pressure.

(a) The data would best be analyzed as an example of
A. one population proportion
B. two populations, difference in proportions
C. one population mean
D. two populations, difference in means, paired data
E. two populations, difference in means, unpaired data
F. goodness of fit
G. contingency table

(b) Which tests must you perform on your sample data before doing the analysis mentioned above? (In other words, how would you make sure that the sample meets the requirements?)

### Problem Set 2: Calculations

Show your work for all problems. Round probabilities to four decimal places and test statistics (t, z, χ²) to two. For hypothesis tests, check requirements and show all six numbered steps.

36 You are testing the assertion, “Judge Judy is more friendly to plaintiffs than Judge Wapner was.” Since it would be tedious to tabulate the hundreds or thousands of decisions each judge has handed down, you randomly select 32 of each judge’s decisions. Judge Judy’s average award to plaintiffs was \$650 (standard deviation = \$250) and Judge Wapner’s was \$580 (standard deviation = \$260). Assume that the amounts are normally distributed without outliers. Using a significance level of 0.05, can you conclude that Judge Judy does indeed give higher awards on average?
37 Weights of frozen turkeys at one large market were normally distributed with a mean of 14.8 pounds and a standard deviation of 2.1 pounds. If there were 10,000 turkeys in the market, how many choices would a shopper have who wanted a bird 20.5 pounds or larger? (Hint: begin by figuring the percentage or proportion of turkeys in that weight range.)
38 (from Johnson and Kuby 2003 [see “Sources Used” at end of book], problem 9.26) “The addition of a new accelerator is claimed to decrease the drying time of latex paint by more than 4%. Several test samples were conducted with the following percentage decrease in drying time:
“5.2    6.4    3.8    6.3    4.1    2.8    3.2    4.7
“If we assume that the percentage decrease in drying time is normally distributed”
(a) Test the claim, at the .05 level.
(b) “Find the 95% confidence interval for the true mean decrease in the drying time based on this sample.”
39 28% of a certain breed of rabbits are born with long hair. Assume that the distribution is random, and consider a litter of five rabbits.

(a) What is the probability that none of the rabbits in the litter have long hair?

(b) What is the probability that one or more in a litter have long hair?

(c) What is the probability that four or five of them have long hair?

(d) What is the average number (mean) of long-haired rabbits you expect in a litter of five?

40 A survey asked a number of professionals, “Which of the following is your most common choice for breakfast?” Using the following data from a random survey, determine whether doctors choose breakfasts in different proportions from other self-employed professionals, to a .05 significance level.
```        Cereal  Pastry   Eggs   Other   No bfst  Total
Doctors     85      22     47      60        17    231
Others     185      90    160     135        35    605
Total      270     112    207     195        52    836```
41 Suppose that the mean adult male height is 5′10″ (70″) and the standard deviation is 2.4″.
(a) If a particular man’s z-score is −1.2, what is his actual height to the nearest 0.1″?
(b) Using the Empirical Rule, what percentile is a height of 67.6″?
(c) By the Empirical Rule, what proportion of adult men are shorter than 74.8″?
life, hrcount
500–6506
650–80018
800–95060
950–110089
1100–125029
1250–140017

42The length of life of a random sample of incandescent light bulbs was obtained, and the results are in the table at right.
(a) Plot a histogram of the data.
(b) What is the size of the sample, with its proper symbol?
(c) What are the mean and standard deviation? (Use the proper symbols and round to one decimal place.)
(d) What is the relative frequency of the 1100–1250 class?

43 One way to set speed limits is to observe a random sample of drivers and set the speed limit at the 85th percentile. What speed corresponds to that 85th percentile, assuming drivers’ speeds are normally distributed with μ = 57.6 and σ = 5.2 mph?
44 You’re planning a survey to see what fraction of people who live in Virgil would take the bus if the county added a route between Greek Peak and downtown Cortland via routes 392 and 215.
(a) You think the answer is only about 20% of them. If you need 90% confidence in an answer to within ±4%, how many people will you need to survey?
(b) What if you have no idea of the answer? How many would you need to survey then?
45 Some popular fast-food items were compared for calories and fat, and the results are shown below:
 Calories (x) Fat (y) 270 420 210 450 130 310 290 450 446 640 233 9 20 10 22 6 25 7 20 20 38 11

(a) Make a scatterplot on your TI-83. Do you expect a positive, negative, or zero correlation? Why?
(b) Find the correlation coefficient and the equation of the line of best fit and write them down. Round to four decimal places and use proper symbols.
(c) Give the value of the y intercept and interpret its meaning.
(d) Using the regression equation or your TI-83 graph, how many grams of fat would you predict for an item of 310 calories? Explain why this is different from the actual data point (310 calories, 25 grams).
(e) What is the value of the residual for the data point (310,25)?
(f) What is the value of the coefficient of determination in this regression? What does it mean?
(g) The decision point for n = 11 is 0.602. What if anything can you say about the correlation for all fast foods?

46 Aluminum plates produced by a company are normally distributed with a mean thickness of 2.0 mm and a standard deviation of 0.1 mm. If 6% of the plates are too thick, what is the cutoff point between “too thick” and “acceptable?”
47 Many people took a physical fitness course. Seven of them were randomly selected and were tested for how many sit-ups they could do. The same seven were re-tested after the course. From the data below, can you conclude that improvement took place among the general run of people who took the course? Use α = 0.01.
```         Anne    Bill   Chance   Deb      Ed     Frank   Grace
Before    29      22      25      29      26      24      31
After     30      26      25      35      33      36      32```
48 Your average morning commute time is 27 minutes, with SD 4 minutes. Your morning commute times are ND.
(a)How likely is a morning commute under 24 minutes?
(b)You pick a week (five mornings) at random. How likely is an average commute time under 24 minutes?
Studio/efficiency18.2%75
1 bedroom18.2%60
2 bedrooms40.4%105
3 bedrooms18.2%45
Over 3 bedrooms5.0%15
Total100.0%300
49(adapted from Johnson and Kuby 2003 [see “Sources Used” at end of book] problem 11.15)  A survey was taken nationally to see what size vacation home people preferred. A separate survey was taken in Nebraska. Both were random samples. Do the Nebraska results differ significantly (0.05 level) from the national results?
50 An experiment was designed to test the effectiveness of a short course that teaches diabetic self-care. Fifty diabetic patients were enrolled in the course, and fifty others served as a control group. (Patients were randomly assigned between the two groups.) Six months after the course, blood sugar levels were tested and results obtained as follows:
Diabetic course group: mean = 6.5, standard deviation = 0.7
Control group: mean = 7.1, standard deviation = 0.9
(a)At a significance level of 0.01, does the diabetic course succeed in lowering patients’ blood sugar?

(b) Obviously diabetic patients are not all the same. In this experiment, the largish sample sizes and randomization mean that confounding variables are probably balanced out in the two groups.

But suppose you had money only for a smaller study, with a total of 30 patients. Suggest an experimental design that would control for most lurking variables. What problem can you see with that design?

51 (adapted from Johnson and Kuby 2003 [see “Sources Used” at end of book] problem 9.36)  “A study in the journal PAIN, October 1994, reported on six patients with chronic myofascial pain syndrome. The mean duration of pain had been 3.0 years for the 6 patients and the standard deviation had been 0.5 year. Test the hypothesis that the mean pain duration of all patients who might have been selected for this study [meaning, of all persons who suffer from this condition] was greater than 2.5 years. Use α = 0.05. Assume that the sample is a random sample, normally distributed with no outliers.
52 In a survey of working parents, 200 men and 200 women were randomly selected and asked, “Have you refused a promotion because it would mean less time with your family?” Of the men, 60 said yes; 48 of the women said yes.
(a) Obviously more men in the sample refused promotions. But can you conclude at the 0.05 significance level that a higher percentage of all working men have refused promotions, versus the percentage of all working women?
(b) In an English sentence, state a 95% confidence interval for the difference in percentages of men and women who refuse promotions.
53 Ten thousand students take a test, and their scores are normally distributed. If the middle 95% of them score between 70 and 130, what are the mean and standard deviation?
54 An insurance company advertises that 75% of its claims are settled within two months of being filed. The state insurance commission thinks the percentage is less than 75, and sets out to prove it. First a small study is done. For this preliminary study, the commissioner can live with a 5% chance of making a Type I error. The commission staff randomly selects 65 claims, and finds out that 40 were settled within two months. Based on this study, can you say that less than 75% of claims are settled within two months?
55 Work this problem only if you studied the optional extras in the Probability chapter.
A shoe store gets its shoes from just two companies, 40% from A and 60% from B. 2.5% of pairs from Brand A are mislabeled, and 1.5% of pairs from Brand B are mislabeled. Find the probability that a randomly selected pair of shoes in the store is mislabeled.
56 Ten randomly selected men compared two brands of razors. Each man shaved one side of his face with brand A and the other side with brand B. (They flipped coins to decide which razor to use on which side.) Each tester assigned a “smoothness score” of 1 to 10 to each side after shaving. The scores are as shown below. Determine whether there is a difference in smoothness performance between the two razors, using α = 0.10.
```            Man   1   2   3   4   5   6   7   8   9  10
A score   7   8   3   5   4   4   9   8   7   4
B score   5   6   3   4   6   5   6   7   3   4```
57 In August 2009, the National Geographic News Web site reported that 90% of US currency was tainted with cocaine.
(a) If you drew a random sample of two bills, what is the chance that exactly one of them is tainted with cocaine?
(b) You have ten bills, and you’ve been told that 90% of these ten bills are tainted with cocaine. If you draw two of the ten bills at random, what is the chance that exactly one of your two is tainted with cocaine?
58 Fifteen farms were randomly selected from a large agricultural region. Each farm’s yield of wheat per acre was measured. For the 15 farms, the mean yield per acre was 85.5 bushels and the standard deviation was 10.0 bushels. Find a 90% confidence interval for the mean yield per acre for all farms in this region, assuming yield per acre is normally distributed and there were no outliers in the sample.
59 You draw five cards from a deck, without replacement, and record the number of aces you drew. Then you replace the five cards and shuffle the deck thoroughly. If you repeat this experiment many times, is the number of aces in five cards drawn a binomial distribution? Why or why not?
60 In a survey of 300 people from Tompkins County, 128 of them preferred to rent or stream a movie on Saturday night rather than watch broadcast or cable TV. In Cortland County, 135 of 400 people surveyed preferred a movie. You’re interested in the difference of proportion in movie renters for Tompkins County over Cortland County. Both surveys were random samples.
(a) What is the point estimate for that difference?
(b) Find the 98% confidence interval for the difference in the two proportions for all residents of the counties.
(c) What is the maximum error of estimate, at the 98% confidence level?
Germinated Didn’t 80 20 135 15

61Two batches of seeds were randomly drawn from the same lot, and one batch was given a special treatment. Consider the data for germination shown at right. At significance level 0.05, does the treatment make any difference in how likely seeds are to germinate?

Now check yourself on the solutions page.

## What’s New

• 9 Dec 2015: Make it explicit that these surveys were random.
• 6 May 2015: Change a problem from gasoline octane to weight of candy bars.
• 13 Jan 2015: Add several study aids to the list of review documents.
• (intervening changes suppressed)
• 11 Nov 2007: Amalgamate the old separate sets of review problems for descriptive and inferential statistics.
• 4 Nov 2011: Create the list of key concepts.
Because this textbook helps you,