Stats without Tears
8. How Samples Vary
Updated 25 June 2015
(What’s New?)
Copyright © 2013–2019 by Stan Brown
Updated 25 June 2015
(What’s New?)
Copyright © 2013–2019 by Stan Brown
Inferential statistics says, “I’ve got this sample. What does it tell me about the population it came from?” Eventually, you’ll estimate a population mean or proportion from a sample and use a sample to test a claim about a population. In essence, you’re reasoning backward from known sample to unknown population. But how? This chapter lays the groundwork.
First you have to reason forward. Way back in Chapter 1, you learned that samples vary because no one sample perfectly represents its population. In this chapter, you’ll put some numbers on that variation. You’ll learn about sampling distributions, and you’ll calculate the likelihood of getting a particular sample from a known population. That will be the basis for all your inferential statistics, starting in Chapter 9.
Acknowledgements: The approach I take to this material was suggested by What Is a pValue Anyway? (Vickers 2010, ch 10 [see “Sources Used” at end of book]), though of course any problems with this chapter are my responsibility and not Vickers’.
The software used to prepare most of the graphs and all of the simulations for this chapter is @RISK from Palisade Corporation.
Lengths of 30 Tunes  

mm:ss  seconds 
2:00  120 
3:39  219 
4:02  242 
2:14  134 
2:09  129 
1:45  105 
4:35  275 
1:16  76 
6:52  412 
4:28  268 
8:06  486 
3:19  199 
10:51  651 
4:51  291 
2:06  126 
3:30  210 
2:31  151 
1:38  98 
1:40  100 
1:32  92 
5:05  305 
3:51  231 
12:14  734 
7:48  468 
6:50  410 
5:13  313 
10:44  644 
1:57  117 
7:31  451 
6:15  375 
Having time on my hands, I was curious about the lengths of tunes in the Apple Store. Being lazy, I decided to look instead at the lengths of tunes in my iTunes library. There are 10113 of them, and I’m going to assume that they are representative. (That’s my story, and I’m sticking to it.)
I set Shuffle to Songs and then took the first 30, which gave me the times you see at right for a random sample of size 30.
Here is a histogram of the data. The tune times are moderately skewed right. That makes sense: most tunes run around two to five minutes, but a few are longer.
The mean of this sample is 280.9 seconds, and the standard deviation is 181.7 seconds. But you know that there’s always sampling error. No sample can represent the population perfectly, so if you take another sample from the same population you’d expect to see a different mean, but not very different. This chapter is all about what differences you should expect.
First, ask yourself: Why should you expect the mean of a second sample to be “different, but not very different” from the mean of the first sample? The samples are independent, so why should they relate to each other at all?
Answer: because they come from the same population. In a given sample, you would naturally expect some data points below the population mean μ, and others above μ. You’d expect that the points below μ and the points above μ would more or less cancel each other out, so that the mean of a sample should be in the neighborhood of μ, the mean of the population.
And if you think a little further about it, you’ll probably imagine that this canceling effect works better for larger samples. If you have a sample of four data points, you wouldn’t be much surprised if they’re all above μ or all below μ. If you have a sample of 100 data points, having them all on one side of μ would surprise you as much as flipping a coin 100 times and getting 100 heads. So you expect that the means of large samples tend to stick closer to μ than the means of small samples do. That’s absolutely true, as you’ll find out in this chapter.
To get a handle on “different, but not very different”, take a look at a second sample of 30 from the same population. This one has x̅ = 349.1, s = 204.2 seconds. From its histogram, you can see it’s a bit more strongly skewed than the first sample.
The two sample means differ by 349.14−280.93 ≈ 68.2 seconds. That might seem like a lot, but it’s only about a quarter of the first sample mean and under a fifth of the second sample mean. Also, it’s a lot less than the standard deviations of the two samples, meaning that the difference between samples is much less than the variability within samples.
There’s an element of hand waving in that paragraph. Sure, it seems plausible that the two sample means are “different, but not very different”; but you could just as well construct an argument in words that the two means are different. Without numbers to go on, how much of a difference is reasonable? In statistics, we like to use numbers to decide whether a thing is reasonable or not. How can we make a numerical argument about the difference between samples? Well, put on your thinking cap, because I’m about to blow your mind.
The key to sample variability is the sampling distribution.
Notice that n is the size of each sample, not the number of samples. There’s no symbol for the number of samples, because it’s indefinitely large.
The sampling distribution is a new level of abstraction. It exists only in our minds: nobody ever takes a whole lot of samples of the same size from a given population. You can think of the sampling distribution as a “what if?” — if you took a whole lot of samples of a given size from the same population, and computed the means of all those samples, and then took those means as a new set of data for a histogram, what would that distribution look like?
Why ask such an abstract question? Simply this: if you know how samples from a known population are distributed, you can work backward from a single sample to make some estimates about an unknown population. In this chapter, I work from a population of tunes with known mean and standard deviation, and I ask what distribution of sample means I can expect to get. In the following chapters, I’ll turn that around: looking at one sample, we’ll ask what that says about the mean and standard deviation of the population that the sample came from.
What does a sampling distribution look like? Well, I used a computer simulation with @RISK from Palisade Corporation to take a thousand samples of 30 tunes each — the same n as before — and this is what I got:
“Big whoop!” I hear you say. I agree, it’s not too impressive at first glance. But let’s compare this distribution of sample means to the population those samples come from.
(In real life, you wouldn’t know what the population looks like. But in this chapter I work from a known population to explore what the distribution of its samples looks like. Starting in the next chapter, I’ll turn that around and use one sample to explore what the population probably looks like.)
Look at the two histograms below. The lefthand plot shows the individual lengths of all the tunes in the population — it’s a histogram of the original population. The righthand plot shows the means of a whole lot of samples, 30 tunes per sample — it’s a histogram of the sampling distribution of the mean. That righthand plot is the same as the plot I showed you a couple of paragraphs above, just rescaled to match the lefthand plot for easier comparison.
Now, what can you see?
Population (indiv. tunes)  Sampling Distribution  

Values  50 to 1000s(*)  200 to 400 
Middle 95% of values  98.0 to 696.3  244.6 to 359.1 
Standard deviation  158.6  29.0 
(*) I cut off the right tail of the population graph to save space. 
At this point, you’re probably wondering if similar things are true for other numeric populations. The answer is a resounding YES.
When you describe a distribution of continuous data, you give the center, spread, and shape. Let’s look at those in some detail, because this will be key to everything you do in inferential statistics.
Before I get into the properties of the sampling distribution, I’d like to tell you about two Web apps that let you play with sampling distributions in real time. (I’m grateful to Benjamin Kirk for suggesting these.)
If you possibly can, try out these apps, especially the second one. Sampling distributions are new and strange to you, and playing with them in real time will really help you to understand the text that follows.
The mean of the sampling distribution of x̅ equals the mean of the population: μ_{x̅} = μ.
This is true regardless of the shape of the original population and regardless of sample size.
Why is this true? Well, you already know that when you take a sample, usually you have some data points that are higher than the population mean and some that are lower. Usually the highs and lows come pretty close to canceling each other out, so the mean of each sample is close to μ — closer than the individual data points, that is.
When you take a distribution of sample means, the same thing happens at the second level. Some of the sample means x̅ are above μ and some are below. The highs and lows tend to cancel, so the average of the averages is pretty darn close to the population mean.
The standard deviation of the sampling distribution of x̅ has a special name: standard error of the mean or SEM; its symbol is σ_{x̅}. The standard error of the mean for sample size n equals the standard deviation of the population divided by the square root of n: SEM or σ_{x̅} = σ/√n.
This is true regardless of the shape of the original population and regardless of sample size.
Okay, the sample is n random values drawn from a population with a variance of σ². The total of those n values in the sample is a random variable with a variance of σ²n, and therefore the standard deviation of the total is √(σ²n) = σ√n. Now divide the sample total by n to get the sample mean. x̅ is a random variable with a standard deviation of (σ√n)/n = σ/√n. QED — which is Latin for “told ya so!”
Summary: If the original population is normally distributed (ND), the sampling distribution of the mean is ND. If the original population is not ND, still the sampling distribution is nearly ND if sample size is ≥ 30 or so but not more than about 10% of population size.
You can probably see that if you take a bunch of samples from a ND population and compute their means, the sample means will be ND also. But why should the means of samples from a skewed population be ND as well?
The answer should be called the Fundamental Theorem of Statistics, but instead it’s called the Central Limit Theorem. (The name was given by Richard Martin Edler von Mises in a 1919 article, but the theorem itself is due to the Marquis de Laplace, in his Théorie analytique des probabilités [1812].) The CLT is the only theorem in this whole course. There is a mathematical way to state and prove it, but we’ll go for just a conceptual understanding.
The sampling distribution of the mean approaches the normal distribution, and does so more closely at larger sample sizes.
An equivalent form of the theorem says that if you take a selection of independent random variables, and add up their values, the more independent variables there are, the closer their sum will be to a ND.
The second form of the theorem explains why so many reallife distributions are bell curves: Most things don’t have a single cause, but many independent causes.
Example: Lots of independent variables affect when you leave the house and your travel time every day. That means that any person’s commute times are ND, and so are people’s arrival times at an event. The same sorts of variables affect when buses arrive, so wait times are ND. Most things in nature have their growth rate affected by a lot of independent variables, so most things in nature are ND.
But it’s the first form of the theorem that we’ll use in this chapter. If samples are randomly chosen, or chosen by another valid sampling technique, then they will be independent and the Central Limit Theorem will apply.
The further the population is from a ND, the bigger the sample you need to take advantage of the CLT. Be careful! It’s size of each sample that matters, not number of samples. The number of samples is always large but unspecified, since the sampling distribution is just a construct in our heads. As a rule of thumb, n=30 is enough for most populations in real life. And if the population is close to normal (symmetric, with most data near the middle), you can get away with smaller samples.
On the other hand, the sample can’t be too large. For samples drawn without replacement (which is most samples), the sample shouldn’t be more than about 10% of the population. In symbols, n ≤ 0.1N. Suppose you don’t know the population size, N? Multiply left and right by 10 and rewrite the requirement as 10n ≤ N. You always know the sample size, and if you can make a case that the population is at least ten times that size then you’re good to go.
You’ll remember that the population of tune times was highly skewed, but the sampling distribution for n=30 was pretty nearly bell shaped. To show how larger sample size moves the sampling distribution closer to normal, I ran some simulations of 1000 samples for some other sample sizes. Remember that the sampling distribution is an indefinitely large number of samples; you’re still seeing some lumpiness because I ran only 1000 samples in each simulation.
The means of 3tune samples are still fairly well skewed, though the range is less than the population range. Increasing sample size to 10, the skew is already much less. 20tune samples are pretty close to a bell curve except for the extreme righthand tail. Finally, with a sample size of 100, we’re darn close to a bell curve. Yes, there’s still some lumpiness, but that’s because the histogram contains only 1000 sample means.
The requirements mentioned in this chapter will be your “ticket of admission” to everything you do in the rest of the course. If you don’t check the requirements, the calculator will happily calculate numbers for you, they’ll be completely bogus, and your conclusions will be wrong but you won’t know it. Always check the requirements for any type of inference before you perform the calculations.
I talk about “requirements”. By now you’ve probably noticed that I think very highly of DeVeaux, Velleman, and Bock’s Intro Stats (2009) [see “Sources Used” at end of book]. They test the same things in practice, but they talk about “assumptions” and “conditions”. Assumptions are things that must be true for inference to work, and conditions are ways that you test those assumptions in practice.
You might like their approach better. It’s the same content, just a different way of looking at it. And sampling distributions are so weird and abstract that the more ways you can look at them the better! Following DeVeaux pages 591–593, here’s another way to think about the requirements.
Independence Assumption: Always look at the overall situation and try to see if there’s any way that different members of the sample can affect each other. If they seem to be independent, you’ll then test these conditions:
These conditions must always be met, but they’re a supplement to the Independence Assumption, not a substitute for it. If you can see any way in which individuals are not independent, it’s game over regardless of the conditions.
Normal Population Assumption: For numeric data, the sampling distribution must be ND or you’re dead in the water. There are two conditions to check this:
The Normal Population Assumption and the Nearly Normal Condition or Large Sample Condition are for numeric data and only numeric data. We’ll have a separate set of requirements, assumptions, and conditions for binomial data later in this chapter.
See also: Is That an Assumption or a Condition? is a very nice summary by Bock [see “Sources Used” at end of book] of all assumptions and conditions. It puts all of our requirements for all procedures into context. (Just ignore the language directed at instructors.)
Ultimately, you’ll use sampling distributions to estimate the population mean or proportion from one sample, or to test claims about a population. That’s the next four chapters, covering confidence intervals and hypothesis tests. But before that, you can still do some useful computations.
For all problems involving sampling distributions and probability of samples, follow these steps:
normalcdf
.
Caution! Don’t use rounded numbers in this calculation.You are auditing a bank. The bank managers have told you that the average cash deposit is $200.00, with standard deviation $45.00. You plan to take a random sample of 50 cash deposits. (a) Describe the distribution of sample means for n = 50. (b) Assuming the given parameters are correct, how likely is a sample mean of $189.56 or below?
Solution (a): Recall that describing a distribution means giving its center, its spread, and its shape.
Solution (b): Please refer to How to Work Problems, above. You’ve already described the distribution, so the next step is to make the sketch. You may be tempted to skip this step, but it’s an important reality check on the numerical answer you get from your calculator.
The sketch for this problem is shown at right. Please observe these key points when sketching sampling distribution problems:
Next, compute the probability on your calculator.
Press [2nd
VARS
makes DISTR
] [2
] to select
normalcdf
.
Fill in the arguments,
either on the wizard interface or in the function itself. Either way,
you need four arguments, in this order:
normalcdf
calculations are particularly sensitive to rounding errors, especially when one or
both boundaries are out in the tails, so use the exact value:
45/√50.With the “wizard” interface:  With the classic interface: 

The wizard prompts you for a standard deviation σ. Don’t
enter the SD of the population. Do enter the SD of the sampling
distribution, which is the standard error.
After entering the standard error, press [ 
After entering the standard error, press [) ] [ENTER ].
You’ll have two closing parentheses, one for the square root and
one for normalcdf .

Always show your work. There’s no need to write down all your keystrokes, but do write down the function and its arguments:
normalcdf(−10^99, 189.56, 200, 45/√50)
Answer: P(x̅ ≤ 189.56) = 0.0505
Comment: Here you see the power of sampling. With a standard deviation of $45.00, an individual deposit of $189.56 or lower can be expected almost 41% of the time. But a sample mean under $189.56 with n=50 is much less likely, only a little over 5%.
This is one reason you should take the trouble to make your sketch reasonably close to scale. If you enter the standard deviation, 45, instead of the standard error, 45/√50 — a common mistake — you’ll get 0.4083. A glance at your sketch will tell you that can’t possibly be right, so you then know to find and fix your mistake.
US women’s heights are normally distributed (ND), with mean 65.5″ and standard deviation 2.5″. You visit a small club on a Thursday evening, and 25 women are there. (Let’s assume they are a representative sample.) Your pickup line is that you’re a statistics student and you need to measure their heights for class. Amazingly, this works, and you get all 25 heights. How likely is it that the average height is between 65″ and 66″?
Solution: First, get the characteristics of the sampling distribution:
If the SEM is 0.5″, then 65″ and 66″ equal the mean ± one standard error. The Empirical Rule (68–95–99.7 Rule) tells you that about 68% of the data fall between those bounds. In this problem, the sketch is a really good guide to the answer you expect.
This is the distribution of sample means, so you expect 68% of them to fall between those bounds. But do the computation anyway, because the Empirical Rule is approximate and now you’re able to be precise. Also, the SEM of 0.5″ is an exact number, but still I put the whole computation into the calculator just to be consistent.
The chance that the sample mean is between 65″ and 66″ is
P(65 ≤ x̅ ≤ 66) = 0.6827
Remember the difference between the distribution of sample means and the distribution of individual heights. From the computation at the right, you expect to see under 16% of women’s heights between 65″ and 66″, versus over 68% of sample mean heights (for n=25) between 65″ and 66″. That’s the whole point of this chapter: sample means stick much closer to the population mean.
Suppose hotel guests who take elevators weigh on average 150 pounds with standard deviation of 35 pounds. An engineer is designing a large elevator, to lift 50 people. If she designs it to lift 4 tons (8000 pounds), what is the chance a random group of 50 people will overload it?
Need a hint? This is a problem in sample total. You haven’t studied that kind of problem, but you have studied problems in sample means. In math, when you have an unfamiliar type of problem, it’s always good to ask: Can I change this into some type of problem I do know how to solve? In this case, how do you change a problem about the total number of pounds in a sample (∑x) into a problem about the average number of pounds per person (x̅)?
Please stop and think about that before you read further.
Solution: To convert a problem in sums into a problem in averages, divide by the sample size. If the total weight of a sample of 50 people is 8000 lb, then the average weight of the 50 people in the sample is 8000/50 = 160 lb. So the desired probability is P(x̅ > 160):
P(∑x > 8000 for n = 50) = P(x̅ > 160)
And you know how to find the second one.
What does the sampling distribution of the mean look like for μ = 150, σ = 35, n = 50? The mean is μ_{x̅} = 150 lb, and the standard error is 35/√50 ≈ 4.9 lb. That’s all you need to draw the sketch at right. Samples are random, 10×50 = 500 is less than the number of people (or potential hotel guests) in the world, and n = 50 ≥ 30, so the sampling distribution follows a normal model.
Now make your calculation. This time the left boundary is a definite number and the right boundary is pseudo infinity, 10^99. And again, you want the standard error, not the SD of the original population.
With the “wizard” interface:  With the classic interface: 

After entering the standard error, press [ 
After entering the standard error, press [) ] [ENTER ].

Show your work: normalcdf(160, 10^99, 150, 35/√50).
There’s a 0.0217 chance that any given load of 50 people will overload the elevator. That’s not 2% of all loads, but 2% of loads of 50 people. Still, it’s an unacceptable level of risk.
Is there an inconsistency here? Back in Chapter 5, I said that an unusual event was one that had a low probability of occurring, typically under 5%. Since 2% is less than 5%, doesn’t that mean that an overloaded elevator is an unusual event, and therefore it can be ignored?
Yes, it’s unusual. But no, fifty people plunging to a terrible death can’t be ignored. The issue is acceptable risk. Yes, there’s some risk any time you step in an elevator that it will be your last journey. But it’s a small risk, and it’s one you’re willing to accept. (The risk is much greater every time you get into a car.) Without knowing exact figures, you can be sure it’s much, much less than 2%; otherwise every big city would see many elevator deaths every day.
In Chapter 10, you’ll meet the significance level, which is essentially the risk of being wrong that you can live with. The worse the consequences of being wrong, the lower the acceptable risk. With an elevator, 5% is much too risky — you want crashes to be a lot more unusual than that.
Binomial data are yes/no or success/failure data. Each sample yields a count of successes. (A reminder: “success” isn’t necessarily good; it’s just the name for the condition or response that you’re counting, and the other one is called “failure”.)
Need a refresher on the binomial model? Please refer back to Chapter 6.
The summary statistic or parameter is a proportion, rather than a mean. In fact, the proportion of success (p) is all there is to know about a binomial population.
In Chapter 6 you computed probabilities of specific numbers of successes. Now you’ll look more at the proportions of success in all possible samples from a binomial population, using the normal distribution (ND) as an approximation.
Here’s a reminder of the symbols used with binomial data:
p  The proportion in the population.
Example: If 83% of US households have at least one cell phone, then
p = 0.83.
Remember “proportion of all equals probability of one”, so p is also the probability that any randomly selected response from the population will be a success. 

q  = 1−p is therefore the proportion of failure or the chance that any given response will be a failure. 
n  The sample size. 
x  The number of successes in the sample. Example: if 45 households in your sample have at least one cell phone, then x = 45. 
p̂  “phat”, the proportion in the sample, equal to x/n. Example: If you survey 60 households and 45 of them have at least one cell phone, then p̂ = 45/60 = 0.75 or 75%. 
The sampling distribution of the proportion is the same idea as the sampling distribution of the mean, and there are a lot of parallels between the two. (A table at the end of this chapter summarizes them.)
As before, n is the size of each sample, not the number of samples. There’s no symbol for the number of samples, because it’s indefinitely large.
One change from the sampling distribution of x̅ is that the sampling distribution of p̂ is a different data type from the population. The original data are nonnumeric (yeses and noes), but the distribution of p̂ is numeric because the p̂’s are numbers. Each p̂ says “so many percent of this sample were successes.”
The mean of the sampling distribution of p̂ equals the proportion of the population: μ_{p̂} = p (“mu sub phat equals p”).
This is true regardless of the proportion in the original population and regardless of sample size.
Why is this true? The reasons are similar to the reasons in Center of the Sampling Distribution of x̅. p̂ for a given sample may be higher or lower than p of the population, but if you take a whole lot of samples then the high and low p̂’s will tend to cancel each other out, more or less.
The standard deviation of the sampling distribution of p̂ has a special name: standard error of the proportion or SEP; its symbol is σ_{p̂} (“sigmasubphat”). The standard error of the proportion for sample size n equals the square root of the population proportion, times 1 minus the population proportion, divided by the sample size: SEP or σ_{p̂} = √[pq/n].
This is true regardless of the proportion in the original population and regardless of sample size.
Why is this true? For a binomial distribution with sample size n, the standard deviation is √[npq]. That is the SD of the random variable x, the number of successes in a sample of size n. The sample proportion, random variable p̂, is x divided by n, and therefore the SD of p̂ is the SD of random variable x, also divided by n. In symbols, σ_{p̂} = √[npq] / n = √[npq/n²] = √[pq/n].
If np and nq are both ≥ about 10, and 10n ≤ N, the normal model is good enough for the sampling distribution.
Let’s look at some sampling distributions of p̂. First I’ll show you the effect of the population’s proportion of success p, and then the effect of the sample size n.
Using @RISK from Palisade Corporation, I simulated all of the sampling distributions shown here. The mathematical sampling distribution has an indefinitely large number of samples, but I stopped at 10,000.
These first three graphs show the sampling distributions for samples of size n = 4 from three populations with different proportions of successes.
Reminder: these are not graphs of the population — they’re not responses from individuals. They are graphs of the sampling distributions, showing the proportion of successes (p̂) found in a lot of samples.
How do you read these? For example, look at the first graph. This shows the sampling distribution of the proportion for a whole lot of samples, each of size 4, where the probability of success on any one individual is 0.1. You can see that about 67% of all samples have p̂ = 0 (no successes out of four), about 29% have p̂ = .25 (one success out of four), about 4% have p̂ = .50 (two successes out of four), and so on.
Why the large gaps between the bars? With n = 4, each sample can have only 0, 1, 2, 3, or 4 successes, so the only possible proportions for those samples are 0, 25%, 50%, 75%, and 100%.
But let’s not obsess over the details of these graphs. I’m more interested in the shapes of the sampling distributions.
What do you see? If you take many samples of size 4 from a population with p = 0.1 (10% successes and 90% failures), the sampling distribution of the proportion is highly skewed. Now look at the second graph. When p = .25 (25% successes and 75% failures in the population), again with n = 4 individuals in each sample, the sample proportions are still skewed, but less so. And in the third graph, where the population has p = 0.5 (success and failure equally likely), then the sampling distribution is symmetric even with these small samples.
For a given sample size n, it looks like the closer the population p is to 0.5, the closer the sampling distribution is to symmetric. And in fact that’s true. That’s your takeaway from these three graphs.
Now let’s look at sampling distributions using different sample sizes from the same population. I’ll use a population with 10% probability of success for each individual (p = 0.1).
You’ve already seen the graph of the sampling distribution when n = 4. The three graphs here show the sampling distribution of p̂ for progressively larger samples. (Remember always that n is the number of individuals in one sample. The number of samples is indefinitely large, though in fact I took 10,000 samples for each graph.)
What do you see here? The distribution of p̂’s from samples of 50 individuals is still noticeably skewed, though a lot less than the graph for n = 4. If I take samples of size 100, the graph is starting to look nearly symmetric, though still slightly skewed. And if I take samples of 500 individuals, the distribution of p̂ looks like a bell curve.
What do you conclude from these graphs? First, even if p is far from 0.5 (if the population is quite unbalanced), with large enough samples, the sampling distribution of p̂ is a normal distribution. Second, you need big samples for binomial data. Remember that 30 is usually good enough for numeric data. For binomial data, it looks like you need bigger samples.
Okay, let’s put it together. If the size of each sample is large enough, the sampling distribution is close enough to normal. How large a sample is large enough? It depends on how skewed the original population is, which means it depends on the proportion of successes in the population. The further p is from 0.5, the more unbalanced the population and the larger n must be.
How big a sample is big enough? Here’s what some authors say:
Why the disagreements? They can’t all be right, can they?
Actually, they can. The question is, what’s close enough to a ND? That’s a judgment call, and different statisticians are a little bit more or less strict about what they consider close enough. Fortunately, with samples bigger than a hundred or so, which are customary, all the conditions are usually met with room to spare.
We’ll follow DeVeaux and his buddies and use np ≥ 10 and nq ≥ 10. This is easy to remember: at least ten “yes” and at least ten “no” expected in a sample. (You can compute the expected number of noes as nq = n(1−p) or simply n−np, sample size minus the expected number of yeses.)
How does this work out in practice? Look at the nexttolast graph, with n=100 and p=0.1. It’s close to a bell curve, but has just a little bit of skew. (It’s easier to see the skew if you cover the top part of the graph.)
Check the requirements: np = 100×.1 = 10, and nq = 100−10 = 90. In a sample of 100, 10 successes and 90 failures are expected, on average. This just meets requirements. And that matches the graph: you can see that it’s not a perfect bell curve, but close; but if it was a little more skewed then the normal model wouldn’t be a good enough fit.
De Veaux and friends (page 440) give a nice justification for choosing ≥ 10 yeses and noes. Briefly, the ND has tails that go out to ±infinity, but proportions are between 0 and 1. They chose their “success/failure condition”, at least ten of each, so that the mismatch between the binomial model and the normal model is only in the rare cases.
But there’s an additional condition: the individuals in the sample must be independent. This translates to a requirement that the sample can’t be too large, or drawing without replacement would break the binomial model. Big surprise (not!): Authors disagree about this too. For example, De Veaux and Johnson & Kuby say sample can’t be bigger than 10% of population (n ≤ 0.1N); Sullivan says 5%.
We’ll use n ≤ 0.1N, just like with numeric data. And just as before, you can think of that as 10n ≤ N when you don’t have the exact size of the population.
Example 4: You asked 300 randomly selected adult residents of Ithaca a yesorno question. Is the sample too large to assume independence? You may not know the population of Ithaca, but you can compute 10×300 = 3000 and be confident that there are more than 3000 adult Ithacans. Therefore your sample is not too large.
Don’t just claim 10n ≤ N. Show the computation, and identify the population you’re referring to, like this: “10n = 10×300 = 3000 ≤ number of adult Ithacans.”
Remember to check your conditions: np ≥ about 10, nq ≥ about 10, and 10n ≤ N. And of course your sample must be random.
Just like with numeric data, you might find it helpful to name the requirements for binomial data. These are the same requirements that I just gave you, but presented differently. I’m following DeVeaux, Velleman, Bock (2009, 493) [see “Sources Used” at end of book].
Independence Condition, Randomization Condition, 10% Condition: These are the same for every sampling distribution and every procedure in inferential stats. I’ve already talked about them under numeric data earlier in the chapter. In practice, the 10% Condition comes into play more often for binomial data than numeric data, because binomial samples are usually much larger.
Sample Size Assumption: For binomial data, the sample is like Goldilocks and porridge — it can’t be too big and it can’t be too small. (Maybe it was beds or chairs and not porridge? And what the heck is porridge?) “Too big” is checked by the 10% Condition; “too small” is checked by the
See also: Is That an Assumption or a Condition? (Bock [see “Sources Used” at end of book]). Again, these are the same requirements you see in this textbook, just presented differently.
Working with the sampling distribution of p̂, the technique is exactly the same as for problems involving the sampling distribution of x̅. Follow these steps:
normalcdf
.
Caution! Don’t use rounded numbers in this calculation.Because of this, some authors apply a continuity correction to make the normal model a better fit for the binomial. This means extending the range by half a unit in each direction. For example, if n = 100 and p = 0.20, and you’re finding the probability of 10 to 15 successes, MATH200A part 3 gives a probability of 0.1262. The normal model with standard error √[.20×(1−.20)/100] = 0.04 gives
normalcdf(10/100, 15/100, .2, .04) = 0.0994
With the continuity correction, you compute the probability for 9½ to 15½ successes. Then the normal model gives a probability of
normalcdf(9.5/100, 15.5/100, .2, .04) = 0.1260
This is a better match to the exact binomial probability. Why use the normal model at all, then? Why not just compute the exact binomial probability? Because there’s only a noticeable discrepancy far from the center, and only when the sample is on the small side. (100 is a small sample for binomial data, as you’ll see in the next two chapters.) You can apply the continuity correction if you want, but many authors don’t because it usually doesn’t make enough difference to matter.
1965. Talladega County, Alabama. An African American man named Robert Swain is accused of rape. The 100man jury panel includes 8 African Americans, but through exemptions and peremptory challenges none are on the final jury. Swain is convicted and sentenced to death. (Juries are all male. 26% of men in the county are African American.)
(a) In a county that is 26% African American, is it unexpected to get a 100man jury panel with only eight African Americans?
Solution: “Unexpected”, “unusual”, or “surprising” describes an event with low probability, typically under 5%. This problem is asking you to find the probability of getting that sample and compare that probability to 0.05.
This is binomial data: each member of the sample either is or is not African American. Putting the problem into the language of the sampling distribution, your population proportion is p = 0.26. You’re asked to find the probability of getting 8 or fewer successes in a sample of 100, so n = 100 and your sample proportion must be p̂ ≤ 8/100 or p̂ ≤ 0.08.
Why “8 or fewer”? p̂ = 8% for the questionable sample, and you think it’s too low since it’s below the expected 26%. Therefore, in determining how likely or unlikely such a sample is, you’ll compute the probability of p̂ ≤ 0.08.
First, describe the sampling distribution of the proportion:
Therefore the sampling distribution of the proportion is a ND.
Next, make the sketch and estimate the answer. I’ve numbered the key points in the sketch at right, but if you need a refresher please refer back to the sketch under Example 1.
From this sketch, you’d expect the probability to be very small, and indeed it is.
Compute the probability using normalcdf
as before.
Be careful with the last argument,
which is the standard deviation of the sampling distribution.
Don’t use a rounded number for the standard error,
because it can make a large difference in the probability.
With the “wizard” interface:  With the classic interface: 

The standard error expression, √(.26*(1−.26)/100),
scrolls off the screen as you type it in, so be extra careful!
Press [ 
Press [) ] [ENTER ] after entering the standard error.
You’ll have two closing parentheses, one for the square root and
one for normalcdf .

Always show your work — not keystrokes but the function and its arguments:
normalcdf(−10^99, .08, .26. √(.26*(1−.26)/100))
The SEP is a nasty expression, and you have to enter it
twice in every problem. You might like to save some keystrokes by
computing it once and then storing it in a variable, as I did at the
right. When
you’re drawing the sketch and need the standard error, compute
it as usual but before pressing [ENTER
] press
[STO→
] [x,T,θ,n]. Then when you need the standard
error in normalcdf
, in the wizard or the classic
interface, just press the [x,T,θ,n] key instead of
reentering the whole SEP expression.
The probability is naturally the same whether you use the
shortcut or not.
P(p̂ ≤ 0.08) = 2.0×10^{5}, or P(x ≤ 8) = 0.000 020. There are only 20 chances in a million of getting a 100man jury pool with so few African Americans by random selection from that county’s population. This is highly unexpected — so unlikely that it raises the gravest doubts about the county’s claim that jury pools were selected without racial bias.
You might remember that in Chapter 6 you computed this binomial probability as 0.000 005, five chances in a million. If the ND is a good approximation, why does it give a probability that’s four times the correct probability? Answer: The normal approximation gets a little dicier as you move further out the tails, and this sample is pretty far out (z = −4.10). But is the approximation really that bad? Sure, the relative error is large, but the absolute error is only 0.000 015, 15 chances in a million. Either way, the message is “This is extremely unlikely to be the product of random chance.”
(b) From 1950 to 1965, as cited in the Supreme Court’s decision, every 100man jury pool in the county had 15 or fewer African Americans. How likely is that, if they were randomly selected?
Solution: 15 out of 100 is 15%. You know how to compute the probability that one jury pool would be ≤15% African American, so start with that. You’ve already described the sampling distribution, so all you have to do is make the sketch and then the calculation. Everything’s the same, except your right boundary is 0.15 instead of 0.08.
If you use my little shortcut:  Otherwise: 



Either way, P(p̂ ≤ 0.15) = 0.0061. The Talladega County jury panels are multiple samples with n = 100 in each, so the “proportion of all” interpretation makes sense: In the long run, you expect 0.61% of jury panels to have 15% or fewer African Americans, if they’re randomly selected.
But actually 100% of those jury panels had 15% or fewer African Americans. How unlikely is that? Well, we don’t know how many juries there were in the county in those 16 years, but surely it must have been at least one a year, or a total of 16 or more. The probability that 16 independent jury pools would all have 15% or fewer African Americans, just by chance, is 0.0060745591^{16} ≈ 3E36, effectively zip. And if there was more than one jury a year, as there probably was, the probability would be even lower. Something is definitely fishy.
The binomial probability is 0.0061 also. This is still pretty far out in the lefthand tail (z = −2.51), but the normal approximation is excellent. The message here is that the normal approximation is pretty darn close except where the probabilities are so small that exactness isn’t needed anyway.
Here’s a sidebyside summary of sampling distributions of the mean (numeric data) and sampling distributions of the proportion (binomial data). Always check requirements for the type of data you actually have!
Numeric Data  Binomial Data  

Each individual in sample provides a number.  Each individual in sample provides a success or failure, and you count successes.  
Statistic of one sample  mean x̅ = ∑x/n  proportion p̂ = x/n 
Parameter of population  mean μ  proportion p 
Sampling distribution of the ...  Sampling distribution of the mean (sampling distribution of x̅)  Sampling distribution of the proportion (sampling distribution of p̂) 
Mean of sampling distribution  μ_{x̅} = μ  μ_{p̂} = p 
Standard deviation of sampling distribution  SEM = standard error of the mean  SEP = standard error of the proportion 
σ_{x̅} = σ/√n  σ_{p̂} = √[pq/n]  
Sampling distribution is close enough to normal if ... 


NOTE: n is number of individuals per sample. Number of samples is indefinitely large and has no symbol. 
Key ideas:
normalcdf
to compute probability. In normalcdf
,
the fourth argument is the unrounded standard error, not the
population standard deviation.Chapter 9 WHYL → ← Chapter 7 WHYL
Write out your solutions to these exercises, making a sketch and showing your work for all computations. Then check your solutions against the solutions page and get help with anything you don’t understand.
Caution! If you don’t see how to start a problem, don’t peek at the solution — you won’t learn anything that way. Ask your instructor or a tutor for a hint. Or just leave it and go on to a different problem for now. You may find when you return to that “impossible” problem that you see how to do it after all.
(a) If the manufacturer’s claim is true, is a sample mean of 780 hours surprising? (Hint: Think about whether you need the probability of x̅ ≤ 780 or x̅ ≥ 780.)
(b) Would you accept the manufacturer’s claim?
(a) Describe the sampling distribution of the proportion who believe in angels in samples of 500 Americans.
(b) Use the normal approximation to compute the probability of finding that 350 to 370 in a sample of 500 believe in angels. Reminder: You can’t use the sample counts directly; you have to convert them to sample proportions.
(a) One way beginners play is to bet on red or black. If the ball comes up that color, they double their money; if it comes up any other color, they lose their money. Construct a probability model for the outcome of a $10 bet on red from the player’s point of view.
(b) Find the mean and SD for the outcome of $10 bets on red, and write a sentence interpreting the mean.
(c) Now take the casino’s point of view. A large casino can have hundreds of thousands of bets placed in a day. Obviously they won’t all be same, but it doesn’t take many days to see a whole lot of any given bet. Describe the sampling distribution of the mean for a sample of 10,000 $10 bets on red.
(d) How much does the casino expect to earn on 10,000 $10 bets on red?
(e) What’s the chance that the casino will lose money on those 10,000 $10 bets on red?
(f) What’s the casino’s chance of making at least $2000 on those 10,000 $10 bets?
What is the probability of this happening, if the day’s mean is 5.00 pounds and SD 0.05 pounds?
(a) If you randomly pick one cabbage, what is the probability that its weight is more than 43.0 ounces?
(b) If you randomly pick 14 cabbages, what is the probability that their average weight is more than 43.0 ounces?
Heart Attack  No Attack  Total  p̂  

Placebo  189  10845  11034  1.71% 
Aspirin  104  10933  11037  0.94% 
The heart attack rate among aspirin takers was 0.94%, which looks like an impressive difference. Is there any chance that aspirin makes no difference, and this was just the result of random selection? In other words, how likely is it for that second sample to have p̂ = 0.94% if the true proportion of heart attacks in adult male aspirin takers is actually 1.71%, no different from adult males who don’t take aspirin?
(Hint: The 5% is the tails, the part of the sampling distribution that is not in the middle 95%.)
In a population where 45% have an unfavorable view of the Tea Party, how likely is a sample of 1504 where 737 or more have an unfavorable view? Can you draw any conclusions from that probability?
Updates and new info: https://BrownMath.com/swt/