Stats without Tears
Solutions for Chapter 1
Updated 1 June 2015
Copyright © 2013–2017 by Stan Brown
These pages change
automatically for your screen or printer.
Underlined text, printed
URLs, and the table of contents become live links on screen;
and you can use your browser’s commands to change the size of
the text or search for key words.
If you print, I suggest black-and-white,
← Exercises for Ch 1
is another name
for sample variability, the fact that each sample is different from
the next because no sample perfectly represents the population it was
problems in setting up or carrying out the data collection,
such as poorly worded survey questions and failure to randomize.
Nothing can eliminate sampling error, but you can reduce
it by increasing your sample size.
(Most nonsampling errors can be avoided by proper experimental design
(a) systematic sample.
(b) It is probably a good sample of that
gynecologist’s patients, since there’s no reason to think
that one month is different from another. But it’s a
bad sample of pregnant women in general, because it suffers
from selection bias. This gynecologist’s patients may use
prenatal vitamins differently from pregnant women who see other
gynecologists or who don’t have a regular gynecologist.
(c) observational study
(a) completely randomized
(b) the plant food administered
(c) no food, Gro-Mor, Magi-Grow
(d) 13 heights at the end of the 13 weeks (You could
also make a case for growth rate.)
(e) the 150 bulbs
(f) selection of plant food
(g) the group that gets no plant food
Each family answered the question “How many children do you have?”
(a) The variable is number of children
(b) It is a discrete variable
(c) It summarizes population data, and therefore it is a parameter
Although “numeric” or “quantitative” is
correct, it’s not an adequate answer because it is not as
specific as possible. Discrete and continuous data are treated
differently in descriptive statistics, so it matters which type you
Students are sometimes fooled by the decimal. Always ask
yourself what was the original question asked or the
original measurement taken from each member of the
(a) The sample is the 80 people in your focus group. (It is not the
drinks. It’s also not the people’s preferences: Their
preferences are the data or sample data.)
(b) The sample size is 80, because that’s the number of
people you took data from. It’s not 55: That’s just the
number who gave one particular response.
(c) The population is not stated explicitly, but you can infer
that it’s cola drinkers in general, or Whoopsie Cola drinkers in
(d) You don’t know how many cola drinkers (or Whoopsie Cola
drinkers) there are. You can’t
know, since people
change their soft-drink habits all the time. You can say that the
population is indefinitely large, or you can say that it’s
. (You can say that the
is uncountable, but don’t say that the
Students sometimes answer “80” for population size,
but this is not correct. You took data from 80 people, so those 80
people are your sample and 80 is your sample size.
(a) sampling error (or sample variability)
(b) increase sample size
You’re asking people to
admit to socially disapproved behavior
. People tend to
shade their answers toward socially acceptable behavior.
What can be done to reduce response bias? Interviewers should
be trained to be absolutely neutral in voice and facial expression,
which is how the Kinsey team gathered data on sexual behavior. Or the
question can be asked on a written questionnaire, so that the subject
isn’t looking another person in the face when answering. The
question can also be made less threatening: “Have you ever left
an infant alone in the house, even for just a minute?”
- Random sample: get a list of the resident students. On your calculator,
times, not counting duplicates, and interview the students who came up
in those positions.
- Systematic sample: You can’t station yourself in the cafeteria
because that would exclude all students who don’t use it. Instead,
station yourself at the main entrance to the dorm complex (or station
yourself and confederates at the main entrance to each dorm) and
interview every 20th person. Why k=20 and not
2000/50 = 40? Because whenever you’re there,
you’re bound to miss a sizable proportion of students.
To select the first person to survey, use
randInt(1,20). Remember that a systematic survey begins
with a randomly selected person from 1 to k, not 1 to 50
(sample size) or 1 to 2000 (population size).
Notice that I didn’t suggest a time frame. What do you
think would be a good time to do this?
An alternative procedure might be to walk through the dorms
(assuming you can get in) and interview the students in every 20th
room. You may get better coverage that way than if you wait for them
to come to you.
- Cluster sample: Randomly select 25 rooms, and interview both of
the students in those rooms. (This is a single-stage cluster.)
Best balance? Probably the cluster sample. The true random
sample is a lot of work for a sample of 50, because after selecting
the names you have to track the students down. The systematic sample,
no matter how you do it, is going to miss a lot of students, and you
have that time-period problem. With the cluster sample, you can time
it for when students are likely to be home, and you can go back to
follow up on those you missed.
But nothing is perfect, in this life where we are
born to trouble as the sparks fly upward. The cluster sample
works if the students were randomly assigned to rooms. When students
pick their own roommates, they tend to pick people with similar
attitudes, interests, and activities. That means those two are
more similar to each other than other students, and there’s no
way you can treat that cluster sample as a random sample. The cluster
would probably be safe for freshman, where the great majority would be
randomly assigned, but less so for students in later years.
No, you can’t reach that conclusion, because you can never conclude causation from an observational study.
would have to do an experiment, where people were randomly assigned to
watch Fox News or to watch no news at all, and then see if there was a
difference in how much they knew about the world.
Students often answer questions like this with
hand-waving arguments, either coming up with reasons why
it’s a plausible conclusion or coming up with reasons why it
isn’t. This is statistics, and we have to follow the facts.
Whatever you may think about Fox News, the fact is that observational
studies can’t prove causation.
(a) It excludes people who don’t use the bus.
This means that people who are dissatisfied with the bus are
systematically under-represented. Your survey will probably show that
willingness to pay is higher than it actually is.
(b) sampling bias
“Random” doesn’t mean unplanned; it takes planning.
This is a bogus sample. If you want a more formal statistical word,
call it a convenience sample, an opportunity sample or a
(a) This is attribute data
or qualitative data
. Don’t be fooled by the number 42:
the original question asked was “Do you have at least one
streaming device?” and that’s a yes/no question.
Alternative: the more specific answer
binomial data, which you may have heard in the lecture
though it’s not in the book till
(b) This is descriptive statistics because it’s
reporting data actually measured: 42% of the sample. If it
said “42% of Americans”, then it would be inferential
because you know not every American was asked, so the investigators
must have extrapolated from a sample to the population.
(c) It is a statistic because it is a number
that summarizes data from a sample.
- The first people who present themselves are chosen.
randomly select from among all volunteers. (Better still would be to
randomly select from among all patients, and ask the selected
individuals to volunteer.)
- Participants are not randomly assigned to control and experimental groups.
This is always bad, but it’s especially bad when you
accept a block of volunteers in order.
- The experiment is not double blind, only single blind.
When doctors know who is getting a placebo and who is getting
medicine, they may treat the two groups differently, consciously or
All of these are nonsampling errors.
2.145E-4 is 0.0002145, and 0.0004 is larger than that.
It’s spurious precision
(That much precision could be appropriate
if you had surveyed a few hundred thousand households.)
To fix it, round to one decimal place: 1.9.
(Don’t make the common mistake of
“rounding” to 1.8.)
(a) Non-numeric. (It has the form of a number, but
think about the average area code in a group and you’ll realize
an area code is not a number.)
(f) Discrete. (or
continuous if you allow answers like 6.3)
(a) was done for
(b) Measurement: Amount of each dinner check.
(c) Question: “Did you experience
bloating and stomach pain?” Non-numeric.
(d) Measurement: Number of people in each party.
- 1 June 2015: For the last problem, the lettered parts of the
solution didn’t match the
lettered parts of the problem. Thanks to Thomas Keegan for spotting
- (intervening changes suppressed)
- 19 Jan 2013: New document.