Stats without Tears
Review
Updated 9 Dec 2015
(What’s New?)
Copyright © 2007–2023 by Stan Brown, BrownMath.com
View or
Print:
These pages change
automatically for your screen or printer.
Underlined text, printed
URLs, and the table of contents become live links on screen;
and you can use your browser’s commands to change the size of
the text or search for key words.
If you print, I suggest black-and-white,
two-sided printing.
Summary:
Even if you’ve been doing all the work and keeping up
with the course, the mass of material you need to know for the exam
can be overwhelming. This page helps you identify
what’s most important in preparing for the exam.
(If you’re an
independent learner, it points you to the most important things you
should have learned from your study.)
What’s Important?
Here are your guidelines for reviewing the subject matter.
Do This for Every Chapter
- Read the Summary at the beginning, when there is one.
- Notice when a section is marked optional. When
you’re doing your final studying, spend your time on the core
concepts, not the optional extras.
- Scroll through the chapter and look at the definitions. Do you
understand the meaning of each term and how to use it?
- Scroll through the chapter again, and this time look at section
heads and key concepts, which are marked in bold.
Make notes for your cheat sheet of anything important that you think
you might forget.
Pay attention also to calculator procedures and
formulas. Know when and how to carry out each calculator
procedure, and when and how to use the very few formulas that
aren’t built into your calculator or the
MATH200A program.
There’s
at least one example for each calculator procedure and each formula, so work through it if you need to
refresh your memory. Again, make notes for your cheat sheet of
anything you’re likely to forget.
- The “What Have You Learned?” section
at the end of the chapter lists the most important concepts.
(Links to “What Have You Learned?” are
below.)
in the online version of this document.)
If you’ve actually learned everything listed there, you should
be in good shape for the exam. If you haven’t, review that
section of the text and work the examples.
- Glance over the chapter exercises. If you had trouble
with any of them before, make sure you thoroughly understand
it now.
Finish with Overall Course Review
So much for the trees. Now it’s time to think
“forest”.
- Go through the cheat sheets you just made, and boil them
down to one sheet, front and back.
Making a one-sheet cheat sheet is always useful,
even if the exam will be open book or if
the instructor doesn’t allow any notes at all.
Writing your summary of the course helps you make sense of the course
material, to see it as a whole instead of an unrelated jumble of facts
to memorize.
- Practice with the review problems below. If you
can’t work a problem, go back and learn what you’re
missing. If something is missing from your cheat sheet, add it.
- Get a good night’s rest the night before the exam.
Sleep deprivation makes people make stupid mistakes, so protect
yourself from that.
Links to “What Have You Learned?”
Review Problems
Here are practice problems to help you test your knowledge and prepare for the final exam.
Solutions are provided,
but make a
genuine effort to work any given problem on your own before you turn
to the solution.
How to use:
Don’t necessarily make
it your goal to work every problem. But do at least look at every one
and make sure that you can set it up correctly. Your success on the
final exam hinges on your ability to identify which type of problem
you are facing.
Don’t panic!
This problem set is
much longer than the exam will be, and some problems are
harder than the problems you will meet on the exam.
Problem Set 1: Short Answers
Write your answer to each question. There’s no work to
be shown. Don’t bother with a complete sentence if you can
answer with a word, number, or phrase.
1
Two events A and B are disjoint. Is it
possible for those same events to be independent as well? Give an
example, or explain why it’s impossible.
2
Yummo candy bars are supposed to have an average weight of 87.5 grams
(about three ounces).
To test this, a team of students bought one Yummo bar from each of the
six stores in the village of Carlyle and weighed it.
(a) The data would best
be analyzed as an example of
A. one population proportion
B. two populations, difference in proportions
C. one population mean
D. two populations, difference in means, paired data
E. two populations, difference in means, unpaired data
F. goodness of fit
G. contingency table
(b) Which two tests must you perform on your sample data before
doing the analysis mentioned above? (In other words, how would you
make sure that the sample meets the requirements?)
3
The two main types of data are qualitative and quantitative.
What other names can you give for each? Give an
example of each.
4
The probability of rolling a 6 on an honest die is 1/6. If you
roll an honest die ten times and none of the rolls comes up 6, is the
probability of rolling a 6 on the next roll less than 1/6, equal to
1/6, or greater than 1/6? Explain why.
5
In a large elementary school, you select two age-matched groups
of students. Group 1 follows the normal schedule. Group 2 (with
parents’ permission) spends 30 minutes a day learning to play a
musical instrument. You want to show that learning a musical
instrument makes a student less likely to get into trouble. You
consider a student in trouble if s/he was sent to the principal’s
office at any time during the year.
(a) Write your hypotheses, in symbols.
(b) Identify either the case number or the specific
TI-83 test you would use.
6
Imagine rolling five standard dice. You compute the probability
of rolling no 3s, one 3, and so on up to five 3s. Is this a binomial
probability distribution? With reference to the definition of a
binomial PD, why or why not?
7
Over the course of many statistical experiments, which one of
these values for the significance level would enable you to prove the
most results?
A. 5% B. 1% C. 0.1%
D. Significance level has no effect
on how likely you are to prove a hypothesis.
8
A key step in hypothesis testing is computing a p-value and
comparing it to your preselected α. After you do that, which of the
following conclusions would be possible, depending on the specific
values of p and α?
(Write the letter of each correct answer; there may
be more than one.)
A. Accept H0, reject H1
B. Reject H0, accept H1
C. Fail to accept H0, no conclusion
D. Fail to reject H0, no conclusion
9
Distinguish disjoint events, mutually exclusive events,
and complementary events. Give an example of each.
10
When is a histogram an appropriate graphical method of
presentation?
11
For what type of events does P(A or B) =
P(A) + P(B)? Give an example.
12
In a χ² goodness-of-fit test, which of the
following is/are true?
(A question with this many technical alternatives
should not be on an
exam. Just use it to test your own understanding of χ².)
A. The hypotheses are stated in words rather than relating some
population parameter to a number.
B. The null hypothesis is always some variation on “the observed
sample matches the model.”
C. The alternative hypothesis is always some variation on “our model
is good.”
D. Instead of a p-value, we compare the value of χ²
to α to draw a conclusion.
E. Degrees of freedom equals the number of cells in our model.
F. If the difference between our observed results and our expected
results could likely have occurred by random chance, we reject the null
hypothesis.
13
What are the two types of numeric data called? Explain the
difference, and give an example of each.
14
Suppose the null hypothesis is that a machine is producing
the allowed 1% proportion of defectives
(H0: p = 0.01).
Your experiment could end
in one of several conclusions, depending on your sample data. List
the letters of all possible conclusions from those below.
(The actual conclusion would depend on the choice
of H1, the choice of α, and the calculated p-value. Not
all possible conclusions are listed below.)
A. The machine is producing exactly the acceptable proportion of defectives.
B. The machine is producing no more defectives than acceptable.
C. The machine is producing too many defectives.
D. Unable to prove anything either way.
15
How can you avoid making a Type I error in a hypothesis
test?
16
You want to find what proportion of churchgoers believe that
evolution should be taught in public schools, so you take a systematic
survey at a local mall. You collect 487
survey forms. Of those, 321 identify as churchgoers, and 227 of those 321
say that evolution should be taught in public schools.
(a) What is the population?
(b) What is the population size?
(c) What is the sample size?
(d) Is limiting the sample to churchgoers a bias source?
17
You’re doing a hypothesis test to try to
show that Drug A is more effective than Drug B, and your p-value is
0.0678. Your roommate, who has not taken statistics, asks, “So
there’s a 6.78% chance that the drugs are equally effective,
right?” Explain what the p-value actually means.
18
Eight percent of the 2×4s from a lumber yard have cracks longer
than an inch.
Assume that the defectives are randomly distributed. Do you use a
binomial or a geometric distribution to compute each of the
following, and why? (You don’t actually need to compute the
probabilities; just identify the distributions.)
(a) The probability that no more than five 2×4s in a random
sample of 100 have cracks longer than an inch.
(b) The probability that exactly five 2×4s in a random
sample of 100 have cracks longer than an inch.
(c) The probability that, pulling 2×4s at random, the first
four don’t have cracks longer than an inch but the fifth one
does.
19
Data are gathered and a computation is done to answer the
question “As near as we can tell, how much does the average high-school
student spend on lunch?” This computation would be part of
A. hypothesis test
B. sample size
C. confidence interval
D. none of the above
20
Linear correlation coefficients must lie between what two
values? What value indicates “no linear correlation”? Does this mean
no correlation at all?
21
“Four out of five dentists surveyed recommend Trident
sugarless gum for their patients who chew gum.” Which of these is the
correct symbol for “four out of five dentists surveyed”?
μ
π
σ
p
p̂
po
x
x̅
s
22
A poll concludes that 26.9% of TC3 students are satisfied with
the food service. What is the type of the original data gathered?
23
For what sort of data might you use a pie chart? Why?
24
The mean is usually the best measure of center of numerical
data. But under certain circumstances the mean is not representative
and you prefer a different measure of center. Which circumstances, and
which measure of center?
25
Usually you make what you want to prove the alternative
hypothesis, not the null hypothesis. Why?
26
A company wishes to claim,
“People who eat
our shredded wheat for breakfast every day for a month lose more than
ten points on their cholesterol.” One or more of the following
state the null and alternative hypotheses correctly. Which one(s)?
A. H0 > 10
H1 ≤ 10
B. H0: x̅ > 10
H1: x̅ ≤ 10
C. H0: μ > 10
H1: μ ≤ 10
D. H0: x > 10
H1: x ≤ 10
|
E. H0 = 10
H1 > 10
F. H0: x̅ = 10
H1: x̅ > 10
G. H0: μ = 10
H1: μ > 10
H. H0: x = 10
H1: x > 10
|
I. H0 ≤ 10
H1 > 10
J. H0: x̅ ≤ 10
H1: x̅ > 10
K. H0: μ ≤ 10
H1: μ > 10
L. H0: x ≤ 10
H1: x > 10
|
27
Which of the following is a Type I error?
A. failing to reject the null hypothesis when it is true
B. failing to reject the null hypothesis when it is false
C. rejecting the null hypothesis when it is true
D. rejecting the null hypothesis when it is false
28
Compare an experiment and an observational study.
29
Our symbol for level of confidence in a confidence interval is
α
α/2
1–α
z(α/2)
E
(If none of these, supply the correct symbol.)
30
You gather a random sample of selling prices of
2006 Honda Civics.
Which selection on your TI-83 would be used to test the claim “In
the US, 2006 Honda Civics sell, on average, for more than
$2,000”?
A. Z-Test
B. T-Test
C. 1-PropZTest
D. 1-PropTTest
E. χ²-Test
F. none of these
31
Compare descriptive and inferential statistics, and give an
example of each.
32
You find that your maximum error of estimate (margin of
error) is ±3.3 at a confidence level of 95%. At 90% confidence,
what would be the maximum error of estimate?
A. more than 3.3
B. 3.3
C. less than 3.3
D. can’t say without more information.
33
Compare “sample” and “population”; give an example.
34
You take a random sample of Lamborghini owners and a random
sample of Subaru owners. Which selection on your TI-83 would be used to
answer the question “How much more do Lamborghini owners spend per
year on maintenance than Subaru owners?”
A. ZInterval
B. TInterval
C. 2-SampZInt
D. 2-SampTInt
E. 2-PropZInt
F. none of these
35
You believe that more than 25%
of high-school students experienced strong peer pressure to have sex. To
test this belief, you survey 500 randomly selected graduating seniors
nationwide and find that 150 of them say that they did feel such
pressure.
(a) The data would best
be analyzed as an example of
A. one population proportion
B. two populations, difference in proportions
C. one population mean
D. two populations, difference in means, paired data
E. two populations, difference in means, unpaired data
F. goodness of fit
G. contingency table
(b) Which tests must you perform on your sample data before
doing the analysis mentioned above? (In other words, how would you
make sure that the sample meets the requirements?)
Problem Set 2: Calculations
Show your work for all problems. Round probabilities to four
decimal places and test statistics (t, z, χ²) to two.
For hypothesis tests, check requirements and show all six numbered
steps.
36
You are testing the assertion, “Judge Judy is more
friendly to plaintiffs than Judge Wapner was.” Since it would be
tedious to tabulate the hundreds or thousands of decisions each judge
has handed down, you randomly select 32 of each judge’s decisions.
Judge Judy’s average award to plaintiffs was $650 (standard
deviation = $250) and Judge Wapner’s was $580 (standard
deviation = $260).
Assume that the amounts are normally distributed without outliers.
Using a significance level of 0.05, can you
conclude that Judge Judy does indeed give higher awards on
average?
37
Weights of frozen turkeys at one large market were normally
distributed with a mean of 14.8 pounds and a standard deviation of 2.1
pounds. If there were 10,000 turkeys in the market, how many choices
would a shopper have who wanted a bird 20.5 pounds or larger? (Hint:
begin by figuring the percentage or proportion of turkeys in that
weight range.)
38
(from
Johnson and Kuby 2003 [see “Sources Used” at end of book],
problem 9.26) “The addition of a new accelerator is claimed to
decrease the drying time of latex paint by more than 4%. Several test
samples were conducted with the following percentage decrease in
drying time:
“5.2 6.4 3.8 6.3 4.1 2.8 3.2 4.7
“If we assume that the percentage decrease in drying time is
normally distributed”
(a) Test the claim, at the .05 level.
(b) “Find the 95% confidence interval for the true mean decrease
in the drying time based on this sample.”
39
28% of a certain breed of rabbits are born with long hair.
Assume that the distribution is random, and consider a litter of five
rabbits.
(a) What is the probability that none of the rabbits in the
litter have long hair?
(b) What is the probability that one or more in a litter have
long hair?
(c) What is the probability that four or five of them have long
hair?
(d) What is the average number (mean) of long-haired rabbits
you expect in a litter of five?
40
A survey asked a number of professionals, “Which of the
following is your most common choice for breakfast?” Using the
following data from a random survey, determine whether doctors choose breakfasts
in different proportions from other self-employed professionals, to
a .05 significance level.
Cereal Pastry Eggs Other No bfst Total
Doctors 85 22 47 60 17 231
Others 185 90 160 135 35 605
Total 270 112 207 195 52 836
41
Suppose that the mean adult male height is
5′10″ (70″) and the standard
deviation is 2.4″.
(a) If a particular man’s z-score is −1.2,
what is his actual height to the nearest 0.1″?
(b) Using the Empirical Rule, what percentile is a height of
67.6″?
(c) By the Empirical Rule, what proportion of adult men are
shorter than 74.8″?
life, hr | count |
500–650 | 6 |
650–800 | 18 |
800–950 | 60 |
950–1100 | 89 |
1100–1250 | 29 |
1250–1400 | 17 |
42The length of life of a random sample of incandescent
light bulbs was obtained, and the results are in the table at
right.
(a) Plot a histogram of the data.
(b) What is the size of the sample, with its proper symbol?
(c) What are the mean and standard deviation?
(Use the proper symbols and round to one decimal place.)
(d) What is the relative frequency of the 1100–1250
class?
43
One way to set speed limits is to observe a random sample of
drivers and set the speed limit at the
85th percentile. What speed corresponds to that 85th percentile, assuming
drivers’ speeds are normally distributed with
μ = 57.6 and σ = 5.2 mph?
44
You’re planning a survey to see what fraction of people who
live in Virgil would take the bus if the county added a route between
Greek Peak and downtown Cortland via routes 392 and 215.
(a) You think the
answer is only about 20% of them. If you need 90% confidence in an
answer to within ±4%, how many people will you need to
survey?
(b) What if you have no idea of the answer? How
many would you need to survey then?
45
Some popular fast-food items were compared for calories and fat, and
the results are shown below:
Calories (x) |
270 |
420 |
210 |
450 |
130 |
310 |
290 |
450 |
446 |
640 |
233 |
Fat (y) |
9 |
20 |
10 |
22 |
6 |
25 |
7 |
20 |
20 |
38 |
11 |
(a) Make a scatterplot on your
TI-83. Do you expect a positive, negative, or zero correlation?
Why?
(b) Find the correlation coefficient and the equation of the
line of best fit and write them down. Round to four decimal places and
use proper symbols.
(c) Give the value of the y intercept and interpret its
meaning.
(d) Using the regression equation or your TI-83 graph, how
many grams of fat would you predict for an item of 310 calories?
Explain why this is different from the actual data point (310
calories, 25 grams).
(e) What is the value of the residual for the data point
(310,25)?
(f) What is the value of the coefficient of determination in
this regression? What does it mean?
(g) The decision point for n = 11 is 0.602. What if
anything can you say about the correlation for all fast
foods?
46
Aluminum plates produced by a company are normally distributed
with a mean thickness of 2.0 mm and a standard deviation of
0.1 mm. If 6% of the plates are too thick, what is the cutoff
point between “too thick” and “acceptable?”
47
Many people took a physical fitness course.
Seven of them were
randomly selected and were tested for how many sit-ups they could do.
The same seven were re-tested after the course. From the data below,
can you conclude that improvement took place among the general run of
people who took the course? Use α = 0.01.
Anne Bill Chance Deb Ed Frank Grace
Before 29 22 25 29 26 24 31
After 30 26 25 35 33 36 32
48
Your average morning commute time is 27
minutes, with SD 4 minutes. Your morning commute times are ND.
(a)How likely is a morning commute under 24 minutes?
(b)You pick a week (five mornings) at random. How likely is an
average commute time under 24 minutes?
Unit size | Entire US | Nebraska |
Studio/efficiency | 18.2% | 75 |
1 bedroom | 18.2% | 60 |
2 bedrooms | 40.4% | 105 |
3 bedrooms | 18.2% | 45 |
Over 3 bedrooms | 5.0% | 15 |
Total | 100.0% | 300 |
49(adapted from
Johnson and Kuby 2003 [see “Sources Used” at end of book] problem 11.15)
A survey was taken nationally to see what
size vacation home people preferred. A separate survey was taken in
Nebraska. Both were random samples.
Do the Nebraska results differ significantly (0.05 level)
from the national results?
50
An experiment was designed to test the effectiveness of a short
course that teaches diabetic self-care. Fifty diabetic patients were
enrolled in the course, and fifty others served as a control group.
(Patients were randomly assigned between the two groups.)
Six months after the course, blood sugar levels were tested and
results obtained as follows:
Diabetic course group: mean = 6.5, standard deviation = 0.7
Control group: mean = 7.1, standard deviation = 0.9
(a)At a significance level of 0.01, does the diabetic course succeed
in lowering patients’ blood sugar?
(b) Obviously diabetic patients are not all the
same. In this experiment, the largish sample sizes and randomization
mean that confounding variables are probably balanced out in the two
groups.
But suppose you had money only for a smaller study, with a total
of 30 patients. Suggest an experimental design that would control for
most lurking variables. What problem can you see with that
design?
51
(adapted from
Johnson and Kuby 2003 [see “Sources Used” at end of book]
problem 9.36)
“A study in the journal
PAIN, October
1994, reported on six patients with chronic myofascial pain syndrome.
The mean duration of pain had been 3.0 years for the 6 patients and
the standard deviation had been 0.5 year. Test the hypothesis that the
mean pain duration of all patients who might have been selected for
this study [meaning, of all persons who suffer from this condition]
was greater than 2.5 years.” Use α = 0.05.
Assume that the sample is a random sample, normally distributed with
no outliers.
52
In a survey of working parents, 200 men and 200 women were
randomly selected and
asked, “Have you refused a promotion because it would mean less time
with your family?” Of the men, 60 said yes; 48 of the women said yes.
(a) Obviously more men in the sample refused promotions. But
can you conclude at the 0.05 significance level that a higher
percentage of all working men have refused promotions, versus
the percentage of all working women?
(b) In an English sentence, state
a 95% confidence interval for the difference in percentages of men and
women who refuse promotions.
53
Ten thousand students take a test, and their scores are
normally distributed. If the middle 95% of them score between 70 and 130, what
are the mean and standard deviation?
54
An insurance company advertises that 75% of its claims are settled
within two months of being filed. The state insurance commission
thinks the percentage is less than 75, and sets out to prove it. First a
small study is done. For this preliminary study, the commissioner can
live with a 5% chance of making a Type I error. The commission staff
randomly selects 65 claims, and finds out that 40 were settled within
two months. Based on this study, can you say that less than 75% of
claims are settled within two months?
55
Work this problem only if you studied the optional extras in
the Probability chapter.
A shoe store gets its shoes from just two companies,
40% from A and 60% from B. 2.5% of
pairs from Brand A are mislabeled, and 1.5% of pairs from Brand B are
mislabeled. Find
the probability that a randomly selected pair of shoes in the store is
mislabeled.
56
Ten randomly selected men compared two brands of razors.
Each man shaved one side of his face with brand A and the other side
with brand B. (They flipped coins to decide which razor to use on
which side.)
Each tester assigned a “smoothness score” of 1 to 10 to each side
after shaving. The scores are as shown below. Determine whether
there is a difference in smoothness performance between the two
razors, using α = 0.10.
Man 1 2 3 4 5 6 7 8 9 10
A score 7 8 3 5 4 4 9 8 7 4
B score 5 6 3 4 6 5 6 7 3 4
57
In August 2009, the
National Geographic News Web site reported that 90% of
US currency was tainted with cocaine.
(a) If you drew a random sample of two bills, what is the
chance that exactly one of them is tainted with cocaine?
(b) You have ten bills, and you’ve been told that 90% of
these ten bills are tainted with cocaine. If you draw two of the ten
bills at random, what is the chance that exactly one of your two is
tainted with cocaine?
58
Fifteen farms were randomly selected from a large agricultural
region. Each farm’s yield of wheat per acre was measured. For the
15 farms, the mean yield per acre was 85.5 bushels and the standard
deviation was 10.0 bushels. Find a 90% confidence interval for the
mean yield per acre for all farms in this region, assuming yield per
acre is normally distributed and there were no outliers in the sample.
59
You draw five cards from a
deck, without replacement, and record the number of aces you drew.
Then you replace the five cards and shuffle the deck thoroughly.
If you repeat this experiment many times, is the number of aces in
five cards drawn a binomial distribution? Why or why not?
60
In a survey of 300 people from Tompkins County, 128 of
them preferred to rent or stream a movie on Saturday night rather than
watch broadcast or cable TV.
In Cortland County, 135 of 400 people surveyed preferred a
movie. You’re interested in the difference of proportion in
movie renters for Tompkins County over Cortland County.
Both surveys were random samples.
(a) What is the point estimate for that difference?
(b) Find the 98% confidence interval for the difference in the two
proportions for all residents of the counties.
(c) What is the maximum error of estimate, at the 98% confidence
level?
|
Germinated |
Didn’t |
Untreated |
80 |
20 |
Treated |
135 |
15 |
61Two batches of seeds were randomly drawn from the
same lot, and one batch was given a special treatment. Consider the
data for germination shown at right. At significance level 0.05, does the
treatment make any difference in how likely seeds are to
germinate?
Now check yourself on the
solutions page.
What’s New?
- 9 Dec 2015: Make it explicit that
these surveys were random.
- 6 May 2015: Change a
problem from gasoline octane to
weight of candy bars.
- 13 Jan 2015: Add several
study aids to the list of review
documents.
- (intervening changes suppressed)
- 11 Nov 2007: Amalgamate the old separate sets of review
problems for descriptive and inferential statistics.
- 4 Nov 2011: Create the list of key concepts.