Stats without Tears
Solutions for Chapter 1
Updated 1 June 2015
Copyright © 2013Ė2020 by Stan Brown
These pages change
automatically for your screen or printer.
Underlined text, printed
URLs, and the table of contents become live links on screen;
and you can use your browserís commands to change the size of
the text or search for key words.
If you print, I suggest black-and-white,
← Exercises for Ch 1
is another name
for sample variability, the fact that each sample is different from
the next because no sample perfectly represents the population it was
problems in setting up or carrying out the data collection,
such as poorly worded survey questions and failure to randomize.
Nothing can eliminate sampling error, but you can reduce
it by increasing your sample size.
(Most nonsampling errors can be avoided by proper experimental design
(a) systematic sample.
(b) It is probably a good sample of that
gynecologistís patients, since thereís no reason to think
that one month is different from another. But itís a
bad sample of pregnant women in general, because it suffers
from selection bias. This gynecologistís patients may use
prenatal vitamins differently from pregnant women who see other
gynecologists or who donít have a regular gynecologist.
(c) observational study
(a) completely randomized
(b) the plant food administered
(c) no food, Gro-Mor, Magi-Grow
(d) 13 heights at the end of the 13 weeks (You could
also make a case for growth rate.)
(e) the 150 bulbs
(f) selection of plant food
(g) the group that gets no plant food
Each family answered the question ďHow many children do you have?Ē
(a) The variable is number of children
(b) It is a discrete variable
(c) It summarizes population data, and therefore it is a parameter
Although ďnumericĒ or ďquantitativeĒ is
correct, itís not an adequate answer because it is not as
specific as possible. Discrete and continuous data are treated
differently in descriptive statistics, so it matters which type you
Students are sometimes fooled by the decimal. Always ask
yourself what was the original question asked or the
original measurement taken from each member of the
(a) The sample is the 80 people in your focus group. (It is not the
drinks. Itís also not the peopleís preferences: Their
preferences are the data or sample data.)
(b) The sample size is 80, because thatís the number of
people you took data from. Itís not 55: Thatís just the
number who gave one particular response.
(c) The population is not stated explicitly, but you can infer
that itís cola drinkers in general, or Whoopsie Cola drinkers in
(d) You donít know how many cola drinkers (or Whoopsie Cola
drinkers) there are. You canít
know, since people
change their soft-drink habits all the time. You can say that the
population is indefinitely large, or you can say that itís
. (You can say that the
is uncountable, but donít say that the
Students sometimes answer ď80Ē for population size,
but this is not correct. You took data from 80 people, so those 80
people are your sample and 80 is your sample size.
(a) sampling error (or sample variability)
(b) increase sample size
Youíre asking people to
admit to socially disapproved behavior
. People tend to
shade their answers toward socially acceptable behavior.
What can be done to reduce response bias? Interviewers should
be trained to be absolutely neutral in voice and facial expression,
which is how the Kinsey team gathered data on sexual behavior. Or the
question can be asked on a written questionnaire, so that the subject
isnít looking another person in the face when answering. The
question can also be made less threatening: ďHave you ever left
an infant alone in the house, even for just a minute?Ē
- Random sample: get a list of the resident students. On your calculator,
times, not counting duplicates, and interview the students who came up
in those positions.
- Systematic sample: You canít station yourself in the cafeteria
because that would exclude all students who donít use it. Instead,
station yourself at the main entrance to the dorm complex (or station
yourself and confederates at the main entrance to each dorm) and
interview every 20th person. Why k=20 and not
2000/50 = 40? Because whenever youíre there,
youíre bound to miss a sizable proportion of students.
To select the first person to survey, use
randInt(1,20). Remember that a systematic survey begins
with a randomly selected person from 1 to k, not 1 to 50
(sample size) or 1 to 2000 (population size).
Notice that I didnít suggest a time frame. What do you
think would be a good time to do this?
An alternative procedure might be to walk through the dorms
(assuming you can get in) and interview the students in every 20th
room. You may get better coverage that way than if you wait for them
to come to you.
- Cluster sample: Randomly select 25 rooms, and interview both of
the students in those rooms. (This is a single-stage cluster.)
Best balance? Probably the cluster sample. The true random
sample is a lot of work for a sample of 50, because after selecting
the names you have to track the students down. The systematic sample,
no matter how you do it, is going to miss a lot of students, and you
have that time-period problem. With the cluster sample, you can time
it for when students are likely to be home, and you can go back to
follow up on those you missed.
But nothing is perfect, in this life where we are
born to trouble as the sparks fly upward. The cluster sample
works if the students were randomly assigned to rooms. When students
pick their own roommates, they tend to pick people with similar
attitudes, interests, and activities. That means those two are
more similar to each other than other students, and thereís no
way you can treat that cluster sample as a random sample. The cluster
would probably be safe for freshman, where the great majority would be
randomly assigned, but less so for students in later years.
No, you canít reach that conclusion, because you can never conclude causation from an observational study.
would have to do an experiment, where people were randomly assigned to
watch Fox News or to watch no news at all, and then see if there was a
difference in how much they knew about the world.
Students often answer questions like this with
hand-waving arguments, either coming up with reasons why
itís a plausible conclusion or coming up with reasons why it
isnít. This is statistics, and we have to follow the facts.
Whatever you may think about Fox News, the fact is that observational
studies canít prove causation.
(a) It excludes people who donít use the bus.
This means that people who are dissatisfied with the bus are
systematically under-represented. Your survey will probably show that
willingness to pay is higher than it actually is.
(b) sampling bias
ďRandomĒ doesnít mean unplanned; it takes planning.
This is a bogus sample. If you want a more formal statistical word,
call it a convenience sample, an opportunity sample or a
(a) This is attribute data
or qualitative data
. Donít be fooled by the number 42:
the original question asked was ďDo you have at least one
streaming device?Ē and thatís a yes/no question.
Alternative: the more specific answer
binomial data, which you may have heard in the lecture
though itís not in the book till
(b) This is descriptive statistics because itís
reporting data actually measured: 42% of the sample. If it
said ď42% of AmericansĒ, then it would be inferential
because you know not every American was asked, so the investigators
must have extrapolated from a sample to the population.
(c) It is a statistic because it is a number
that summarizes data from a sample.
- The first people who present themselves are chosen.
randomly select from among all volunteers. (Better still would be to
randomly select from among all patients, and ask the selected
individuals to volunteer.)
- Participants are not randomly assigned to control and experimental groups.
This is always bad, but itís especially bad when you
accept a block of volunteers in order.
- The experiment is not double blind, only single blind.
When doctors know who is getting a placebo and who is getting
medicine, they may treat the two groups differently, consciously or
All of these are nonsampling errors.
2.145E-4 is 0.0002145, and 0.0004 is larger than that.
Itís spurious precision
(That much precision could be appropriate
if you had surveyed a few hundred thousand households.)
To fix it, round to one decimal place: 1.9.
(Donít make the common mistake of
ďroundingĒ to 1.8.)
(a) Non-numeric. (It has the form of a number, but
think about the average area code in a group and youíll realize
an area code is not a number.)
(f) Discrete. (or
continuous if you allow answers like 6.3)
(a) was done for
(b) Measurement: Amount of each dinner check.
(c) Question: ďDid you experience
bloating and stomach pain?Ē Non-numeric.
(d) Measurement: Number of people in each party.
- 1 June 2015: For the last problem, the lettered parts of the
solution didnít match the
lettered parts of the problem. Thanks to Thomas Keegan for spotting
- (intervening changes suppressed)
- 19 Jan 2013: New document.