BrownMath.com → Stats w/o Tears → Ch 1 Solutions

# Stats without TearsSolutions for Chapter 1

Updated 1 June 2015

View or
Print:
These pages change automatically for your screen or printer. Underlined text, printed URLs, and the table of contents become live links on screen; and you can use your browser’s commands to change the size of the text or search for key words. If you print, I suggest black-and-white, two-sided printing.
Because this textbook helps you,
Because this textbook helps you,
BrownMath.com/donate.
1 Sampling error is another name for sample variability, the fact that each sample is different from the next because no sample perfectly represents the population it was drawn from. Nonsampling errors are problems in setting up or carrying out the data collection, such as poorly worded survey questions and failure to randomize.

Nothing can eliminate sampling error, but you can reduce it by increasing your sample size. (Most nonsampling errors can be avoided by proper experimental design and technique.)

2 (a) systematic sample.
(b) It is probably a good sample of that gynecologist’s patients, since there’s no reason to think that one month is different from another. But it’s a bad sample of pregnant women in general, because it suffers from selection bias. This gynecologist’s patients may use prenatal vitamins differently from pregnant women who see other gynecologists or who don’t have a regular gynecologist.
(c) observational study
3 (a) completely randomized
(c) no food, Gro-Mor, Magi-Grow
(d) 13 heights at the end of the 13 weeks (You could also make a case for growth rate.)
(e) the 150 bulbs
(f) selection of plant food
(g) the group that gets no plant food
4 Each family answered the question “How many children do you have?”
(a) The variable is number of children.
(b) It is a discrete variable.
(c) It summarizes population data, and therefore it is a parameter.

Although “numeric” or “quantitative” is correct, it’s not an adequate answer because it is not as specific as possible. Discrete and continuous data are treated differently in descriptive statistics, so it matters which type you have.

Students are sometimes fooled by the decimal. Always ask yourself what was the original question asked or the original measurement taken from each member of the sample.

5 (a) The sample is the 80 people in your focus group. (It is not the drinks. It’s also not the people’s preferences: Their preferences are the data or sample data.)
(b) The sample size is 80, because that’s the number of people you took data from. It’s not 55: That’s just the number who gave one particular response.
(c) The population is not stated explicitly, but you can infer that it’s cola drinkers in general, or Whoopsie Cola drinkers in general.
(d) You don’t know how many cola drinkers (or Whoopsie Cola drinkers) there are. You can’t know, since people change their soft-drink habits all the time. You can say that the population is indefinitely large, or you can say that it’s infinite. (You can say that the population is uncountable, but don’t say that the population size is uncountable.)

Common mistake: Students sometimes answer “80” for population size, but this is not correct. You took data from 80 people, so those 80 people are your sample and 80 is your sample size.

6 (a) sampling error (or sample variability) (b) increase sample size

What can be done to reduce response bias? Interviewers should be trained to be absolutely neutral in voice and facial expression, which is how the Kinsey team gathered data on sexual behavior. Or the question can be asked on a written questionnaire, so that the subject isn’t looking another person in the face when answering. The question can also be made less threatening: “Have you ever left an infant alone in the house, even for just a minute?”

8
• Random sample: get a list of the resident students. On your calculator, do `randInt(1,2000)` 50 times, not counting duplicates, and interview the students who came up in those positions.
• Systematic sample: You can’t station yourself in the cafeteria because that would exclude all students who don’t use it. Instead, station yourself at the main entrance to the dorm complex (or station yourself and confederates at the main entrance to each dorm) and interview every 20th person. Why k=20 and not 2000/50 = 40? Because whenever you’re there, you’re bound to miss a sizable proportion of students.

To select the first person to survey, use `randInt(1,20)`. Remember that a systematic survey begins with a randomly selected person from 1 to k, not 1 to 50 (sample size) or 1 to 2000 (population size).

Notice that I didn’t suggest a time frame. What do you think would be a good time to do this?

An alternative procedure might be to walk through the dorms (assuming you can get in) and interview the students in every 20th room. You may get better coverage that way than if you wait for them to come to you.

• Cluster sample: Randomly select 25 rooms, and interview both of the students in those rooms. (This is a single-stage cluster.)

Best balance? Probably the cluster sample. The true random sample is a lot of work for a sample of 50, because after selecting the names you have to track the students down. The systematic sample, no matter how you do it, is going to miss a lot of students, and you have that time-period problem. With the cluster sample, you can time it for when students are likely to be home, and you can go back to follow up on those you missed.

But nothing is perfect, in this life where we are born to trouble as the sparks fly upward. The cluster sample works if the students were randomly assigned to rooms. When students pick their own roommates, they tend to pick people with similar attitudes, interests, and activities. That means those two are more similar to each other than other students, and there’s no way you can treat that cluster sample as a random sample. The cluster would probably be safe for freshman, where the great majority would be randomly assigned, but less so for students in later years.

9 No, you can’t reach that conclusion, because you can never conclude causation from an observational study. You would have to do an experiment, where people were randomly assigned to watch Fox News or to watch no news at all, and then see if there was a difference in how much they knew about the world.

Students often answer questions like this with hand-waving arguments, either coming up with reasons why it’s a plausible conclusion or coming up with reasons why it isn’t. This is statistics, and we have to follow the facts. Whatever you may think about Fox News, the fact is that observational studies can’t prove causation.

10 (a) It excludes people who don’t use the bus. This means that people who are dissatisfied with the bus are systematically under-represented. Your survey will probably show that willingness to pay is higher than it actually is.
(b) sampling bias
11 “Random” doesn’t mean unplanned; it takes planning. This is a bogus sample. If you want a more formal statistical word, call it a convenience sample, an opportunity sample or a non-probability sample.
12 (a) This is attribute data or qualitative data or non-numeric data. Don’t be fooled by the number 42: the original question asked was “Do you have at least one streaming device?” and that’s a yes/no question.

Alternative: the more specific answer binomial data, which you may have heard in the lecture though it’s not in the book till Chapter 6.

(b) This is descriptive statistics because it’s reporting data actually measured: 42% of the sample. If it said “42% of Americans”, then it would be inferential because you know not every American was asked, so the investigators must have extrapolated from a sample to the population.

(c) It is a statistic because it is a number that summarizes data from a sample.

13
• The first people who present themselves are chosen. You should randomly select from among all volunteers. (Better still would be to randomly select from among all patients, and ask the selected individuals to volunteer.)
• Participants are not randomly assigned to control and experimental groups. This is always bad, but it’s especially bad when you accept a block of volunteers in order.
• The experiment is not double blind, only single blind. When doctors know who is getting a placebo and who is getting medicine, they may treat the two groups differently, consciously or unconsciously.

All of these are nonsampling errors.

14 2.145E-4 is 0.0002145, and 0.0004 is larger than that.
15 It’s spurious precision. (That much precision could be appropriate if you had surveyed a few hundred thousand households.)

To fix it, round to one decimal place: 1.9. (Don’t make the common mistake of “rounding” to 1.8.)

16 (a) Non-numeric. (It has the form of a number, but think about the average area code in a group and you’ll realize an area code is not a number.)
(b) Continuous.
(c) Discrete.
(d) Non-numeric.
(e) Non-numeric.
(f) Discrete. (or continuous if you allow answers like 6.3)
17 (a) was done for you.
(b) Measurement: Amount of each dinner check. Continuous.
(c) Question: “Did you experience bloating and stomach pain?” Non-numeric.
(d) Measurement: Number of people in each party. Discrete.

## What’s New

• 1 June 2015: For the last problem, the lettered parts of the solution didn’t match the lettered parts of the problem. Thanks to Thomas Keegan for spotting this!
• (intervening changes suppressed)
• 19 Jan 2013: New document.
Because this textbook helps you,