BrownMath.com → Stats w/o Tears → 8. How Samples Vary

# Stats without Tears8. How Samples Vary

Updated 15 Nov 2021

View or
Print:
These pages change automatically for your screen or printer. Underlined text, printed URLs, and the table of contents become live links on screen; and you can use your browser’s commands to change the size of the text or search for key words. If you print, I suggest black-and-white, two-sided printing.
Intro:

Inferential statistics says, “I’ve got this sample. What does it tell me about the population it came from?” Eventually, you’ll estimate a population mean or proportion from a sample and use a sample to test a claim about a population. In essence, you’re reasoning backward from known sample to unknown population. But how? This chapter lays the groundwork.

First you have to reason forward. Way back in Chapter 1, you learned that samples vary because no one sample perfectly represents its population. In this chapter, you’ll put some numbers on that variation. You’ll learn about sampling distributions, and you’ll calculate the likelihood of getting a particular sample from a known population. That will be the basis for all your inferential statistics, starting in Chapter 9.

Acknowledgements: The approach I take to this material was suggested by What Is a p-Value Anyway? (Vickers 2010, ch 10 [see “Sources Used” at end of book]), though of course any problems with this chapter are my responsibility and not Vickers’.

The software used to prepare most of the graphs and all of the simulations for this chapter is @RISK from Palisade Corporation.

## 8A.  Numeric Data / Means of Samples

### 8A1.  One Sample and Its Mean

Lengths of
30 Tunes
mm:ss  seconds
2:00120
3:39219
4:02242
2:14134
2:09129
1:45105
4:35275
1:1676
6:52412
4:28268
8:06486
3:19199
10:51651
4:51291
2:06126
3:30210
2:31151
1:3898
1:40100
1:3292
5:05305
3:51231
12:14734
7:48468
6:50410
5:13313
10:44644
1:57117
7:31451
6:15375

Having time on my hands, I was curious about the lengths of tunes in the Apple Store. Being lazy, I decided to look instead at the lengths of tunes in my iTunes library. There are 10113 of them, and I’m going to assume that they are representative. (That’s my story, and I’m sticking to it.)

I set Shuffle to Songs and then took the first 30, which gave me the times you see at right for a random sample of size 30.

Here is a histogram of the data. The tune times are moderately skewed right. That makes sense: most tunes run around two to five minutes, but a few are longer. The mean of this sample is 280.9 seconds, and the standard deviation is 181.7 seconds. But you know that there’s always sampling error. No sample can represent the population perfectly, so if you take another sample from the same population you’d expect to see a different mean, but not very different. This chapter is all about what differences you should expect.

First, ask yourself: Why should you expect the mean of a second sample to be “different, but not very different” from the mean of the first sample? The samples are independent, so why should they relate to each other at all?

Answer: because they come from the same population. In a given sample, you would naturally expect some data points below the population mean μ, and others above μ. You’d expect that the points below μ and the points above μ would more or less cancel each other out, so that the mean of a sample should be in the neighborhood of μ, the mean of the population.

And if you think a little further about it, you’ll probably imagine that this canceling effect works better for larger samples. If you have a sample of four data points, you wouldn’t be much surprised if they’re all above μ or all below μ. If you have a sample of 100 data points, having them all on one side of μ would surprise you as much as flipping a coin 100 times and getting 100 heads. So you expect that the means of large samples tend to stick closer to μ than the means of small samples do. That’s absolutely true, as you’ll find out in this chapter.

To get a handle on “different, but not very different”, take a look at a second sample of 30 from the same population. This one has  = 349.1, s = 204.2 seconds. From its histogram, you can see it’s a bit more strongly skewed than the first sample. The two sample means differ by 349.14−280.93 ≈ 68.2 seconds. That might seem like a lot, but it’s only about a quarter of the first sample mean and under a fifth of the second sample mean. Also, it’s a lot less than the standard deviations of the two samples, meaning that the difference between samples is much less than the variability within samples.

There’s an element of hand waving in that paragraph. Sure, it seems plausible that the two sample means are “different, but not very different”; but you could just as well construct an argument in words that the two means are different. Without numbers to go on, how much of a difference is reasonable? In statistics, we like to use numbers to decide whether a thing is reasonable or not. How can we make a numerical argument about the difference between samples? Well, put on your thinking cap, because I’m about to blow your mind.

### 8A2.  Meet the Sampling Distribution of x̅

The key to sample variability is the sampling distribution.

Definition: Imagine you take a whole lot of samples, each sample with n data points, and you compute the sample mean of each of them. All those ’s form a new data set, which can be called the distribution of sample means, or the sampling distribution of the mean, or the sampling distribution of , for sample size n.

Notice that n is the size of each sample, not the number of samples. There’s no symbol for the number of samples, because it’s indefinitely large.

The sampling distribution is a new level of abstraction. It exists only in our minds: nobody ever takes a whole lot of samples of the same size from a given population. You can think of the sampling distribution as a “what if?” — if you took a whole lot of samples of a given size from the same population, and computed the means of all those samples, and then took those means as a new set of data for a histogram, what would that distribution look like?

Why ask such an abstract question? Simply this: if you know how samples from a known population are distributed, you can work backward from a single sample to make some estimates about an unknown population. In this chapter, I work from a population of tunes with known mean and standard deviation, and I ask what distribution of sample means I can expect to get. In the following chapters, I’ll turn that around: looking at one sample, we’ll ask what that says about the mean and standard deviation of the population that the sample came from.

What does a sampling distribution look like? Well, I used a computer simulation with @RISK from Palisade Corporation to take a thousand samples of 30 tunes each — the same n as before — and this is what I got: “Big whoop!” I hear you say. I agree, it’s not too impressive at first glance. But let’s compare this distribution of sample means to the population those samples come from.

(In real life, you wouldn’t know what the population looks like. But in this chapter I work from a known population to explore what the distribution of its samples looks like. Starting in the next chapter, I’ll turn that around and use one sample to explore what the population probably looks like.)

Look at the two histograms below. The left-hand plot shows the individual lengths of all the tunes in the population — it’s a histogram of the original population. The right-hand plot shows the means of a whole lot of samples, 30 tunes per sample — it’s a histogram of the sampling distribution of the mean. That right-hand plot is the same as the plot I showed you a couple of paragraphs above, just rescaled to match the left-hand plot for easier comparison.  Now, what can you see?

• Shape: The original population is skewed strongly to the right, but the sampling distribution is nearly a bell curve. (The shape is easier to see if you look at the first picture of the sampling distribution. Remember, the right-hand plot and the earlier plot are the same plot, just drawn on different scales.)
• Center: The mean of the sampling distribution is 296.9 seconds, the same as the mean of the population.
• Spread: Individual tune lengths (original population, left graph) vary quite a lot, but means of 30-tune samples (sampling distribution of , right graph) vary much less. You can say that most individual tune lengths are a lot shorter or longer than the population average, but most mean lengths in samples of 30 are very close to the population average. Compare these measures of spread from the two graphs: Population(indiv. tunes) SamplingDistribution 50 to 1000s(*) 200 to 400 98.0 to 696.3 244.6 to 359.1 158.6 29.0 (*) I cut off the right tail of the population graph to save space.

At this point, you’re probably wondering if similar things are true for other numeric populations. The answer is a resounding YES.

### 8A3.  Properties of the Sampling Distribution of x̅

When you describe a distribution of continuous data, you give the center, spread, and shape. Let’s look at those in some detail, because this will be key to everything you do in inferential statistics.

#### There’s an App for That

Before I get into the properties of the sampling distribution, I’d like to tell you about two Web apps that let you play with sampling distributions in real time. (I’m grateful to Benjamin Kirk for suggesting these.)

• Sampling Distributions, part of the Rice Virtual Lab in Statistics. This app lets you sample from symmetric and skewed distributions, at various sample sizes, and see how the sampling distribution builds up. The app plots the sampling distribution and calculates its mean and SD, so you can compare them to the original population and also to the expected center, spread, and shape described below.
• CentLimApplet. This shows you why “sample size at least 30 or so” is a good rule of thumb for numeric data. Try setting the number of samples to the maximum, then increase the sample size one unit at a time, and you’ll see how the sampling distribution gets closer and closer to a ND.

If you possibly can, try out these apps, especially the second one. Sampling distributions are new and strange to you, and playing with them in real time will really help you to understand the text that follows.

#### Center of the Sampling Distribution of x̅

Summary:

The mean of the sampling distribution of equals the mean of the population: μ = μ.

This is true regardless of the shape of the original population and regardless of sample size.

Why is this true? Well, you already know that when you take a sample, usually you have some data points that are higher than the population mean and some that are lower. Usually the highs and lows come pretty close to canceling each other out, so the mean of each sample is close to μ — closer than the individual data points, that is.

When you take a distribution of sample means, the same thing happens at the second level. Some of the sample means are above μ and some are below. The highs and lows tend to cancel, so the average of the averages is pretty darn close to the population mean.

#### Spread of the Sampling Distribution of x̅

Summary:

The standard deviation of the sampling distribution of has a special name: standard error of the mean or SEM; its symbol is σ. The standard error of the mean for sample size n equals the standard deviation of the population divided by the square root of n: SEM or σ = σ/√n.

This is true regardless of the shape of the original population and regardless of sample size.

Why is this true? Each member of the sample is a random variable, all drawn from the same population with a SD of σ and therefore a variance of σ². If you combine random variables — independent random variables — their variances add.

Okay, the sample is n random values drawn from a population with a variance of σ². The total of those n values in the sample is a random variable with a variance of σ²n, and therefore the standard deviation of the total is √σ²n = σ√n. Now divide the sample total by n to get the sample mean. is a random variable with a standard deviation of (σ√n)/n = σ/√n. QED — which is Latin for “told ya so!”

#### Shape of the Sampling Distribution of x̅

Summary: If the original population is normally distributed (ND), the sampling distribution of the mean is ND. If the original population is not ND, still the sampling distribution is nearly ND if sample size is ≥ 30 or so but not more than about 10% of population size.

You can probably see that if you take a bunch of samples from a ND population and compute their means, the sample means will be ND also. But why should the means of samples from a skewed population be ND as well?

The answer should be called the Fundamental Theorem of Statistics, but instead it’s called the Central Limit Theorem. (The name was given by Richard Martin Edler von Mises in a 1919 article, but the theorem itself is due to the Marquis de Laplace, in his Théorie analytique des probabilités .) The CLT is the only theorem in this whole course. There is a mathematical way to state and prove it, but we’ll go for just a conceptual understanding.

Central
Limit
Theorem:

The sampling distribution of the mean approaches the normal distribution, and does so more closely at larger sample sizes.

An equivalent form of the theorem says that if you take a selection of independent random variables, and add up their values, the more independent variables there are, the closer their sum will be to a ND.

The second form of the theorem explains why so many real-life distributions are bell curves: Most things don’t have a single cause, but many independent causes.

Example: Lots of independent variables affect when you leave the house and your travel time every day. That means that any person’s commute times are ND, and so are people’s arrival times at an event. The same sorts of variables affect when buses arrive, so wait times are ND. Most things in nature have their growth rate affected by a lot of independent variables, so most things in nature are ND.

But it’s the first form of the theorem that we’ll use in this chapter. If samples are randomly chosen, or chosen by another valid sampling technique, then they will be independent and the Central Limit Theorem will apply.

The further the population is from a ND, the bigger the sample you need to take advantage of the CLT. Be careful! It’s size of each sample that matters, not number of samples. The number of samples is always large but unspecified, since the sampling distribution is just a construct in our heads. As a rule of thumb, n=30 is enough for most populations in real life. And if the population is close to normal (symmetric, with most data near the middle), you can get away with smaller samples.

On the other hand, the sample can’t be too large. For samples drawn without replacement (which is most samples), the sample shouldn’t be more than about 10% of the population. In symbols, n ≤ 0.1N. Suppose you don’t know the population size, N? Multiply left and right by 10 and rewrite the requirement as 10n ≤ N. You always know the sample size, and if you can make a case that the population is at least ten times that size then you’re good to go.

You’ll remember that the population of tune times was highly skewed, but the sampling distribution for n=30 was pretty nearly bell shaped. To show how larger sample size moves the sampling distribution closer to normal, I ran some simulations of 1000 samples for some other sample sizes. Remember that the sampling distribution is an indefinitely large number of samples; you’re still seeing some lumpiness because I ran only 1000 samples in each simulation.    The means of 3-tune samples are still fairly well skewed, though the range is less than the population range. Increasing sample size to 10, the skew is already much less. 20-tune samples are pretty close to a bell curve except for the extreme right-hand tail. Finally, with a sample size of 100, we’re darn close to a bell curve. Yes, there’s still some lumpiness, but that’s because the histogram contains only 1000 sample means.

#### Requirements, Assumptions, and Conditions

The requirements mentioned in this chapter will be your “ticket of admission” to everything you do in the rest of the course. If you don’t check the requirements, the calculator will happily calculate numbers for you, they’ll be completely bogus, and your conclusions will be wrong but you won’t know it. Always check the requirements for any type of inference before you perform the calculations.

I talk about “requirements”. By now you’ve probably noticed that I think very highly of DeVeaux, Velleman, and Bock’s Intro Stats (2009) [see “Sources Used” at end of book]. They test the same things in practice, but they talk about “assumptions” and “conditions”. Assumptions are things that must be true for inference to work, and conditions are ways that you test those assumptions in practice.

You might like their approach better. It’s the same content, just a different way of looking at it. And sampling distributions are so weird and abstract that the more ways you can look at them the better! Following DeVeaux pages 591–593, here’s another way to think about the requirements.

Independence Assumption: Always look at the overall situation and try to see if there’s any way that different members of the sample can affect each other. If they seem to be independent, you’ll then test these conditions:

• Randomization Condition: Was the sample randomly selected? (A proper systematic sample counts as random.) Later, when you do inference on two samples in Chapter 11, you’ll ask instead whether the participants were randomly assigned to treatments.
• 10% Condition: If the population is small, a decent-sized sample may be too large. Remember, back in Chapter 5, you learned that sampling without replacement changes the mix of what’s left? In practice, if the sample is less than about 10% of the population, the effect is not serious enough to worry about.

These conditions must always be met, but they’re a supplement to the Independence Assumption, not a substitute for it. If you can see any way in which individuals are not independent, it’s game over regardless of the conditions.

Normal Population Assumption: For numeric data, the sampling distribution must be ND or you’re dead in the water. There are two conditions to check this:

• Nearly Normal Condition: If the sample is small, check for normality as you learned in Chapter 7. This matters because skewed data and outliers can distort the sample mean and SD.
• Large Sample Condition: But if the sample is larger, more than about 30, outliers and skew have less effect on the mean and SD, and you don’t have to worry about the Nearly Normal Condition.

The Normal Population Assumption and the Nearly Normal Condition or Large Sample Condition are for numeric data and only numeric data. We’ll have a separate set of requirements, assumptions, and conditions for binomial data later in this chapter.

See also: Is That an Assumption or a Condition? is a very nice summary by Bock [see “Sources Used” at end of book] of all assumptions and conditions. It puts all of our requirements for all procedures into context. (Just ignore the language directed at instructors.)

### 8A4.  Applications

Because this textbook helps you,
Because this textbook helps you,
BrownMath.com/donate.

Ultimately, you’ll use sampling distributions to estimate the population mean or proportion from one sample, or to test claims about a population. That’s the next four chapters, covering confidence intervals and hypothesis tests. But before that, you can still do some useful computations.

#### How to Work Problems

For all problems involving sampling distributions and probability of samples, follow these steps:

1. Determine center, spread, and shape of the sampling distribution, even if you’re not explicitly asked to describe the distribution.
2. If you can’t show that the sampling distribution is ND, stop!
3. Sketch the curve, and estimate the answer. (See examples below.)
4. Compute the probability (area) using `normalcdf`. Caution! Don’t use rounded numbers in this calculation.

#### Example 1: Bank Deposits

You are auditing a bank. The bank managers have told you that the average cash deposit is \$200.00, with standard deviation \$45.00. You plan to take a random sample of 50 cash deposits. (a) Describe the distribution of sample means for n = 50. (b) Assuming the given parameters are correct, how likely is a sample mean of \$189.56 or below?

Solution (a): Recall that describing a distribution means giving its center, its spread, and its shape.

• Center: The mean of the sampling distribution equals the mean of the original population: μ = μ, so μ = \$200.00. This does not depend on whether the sampling distribution is normal.
• Spread: The standard deviation of the sampling distribution of the mean, better known as the standard error of the mean, is σ = σ/√n = 45/√50 and σ = \$6.36. This does not depend on whether the sampling distribution is normal.
• Shape: The sample was random, and 10n = 10×50 = 500 is obviously less than the number of cash deposits at any bank. Sample size 50 is ≥30, so the sampling distribution of the mean is near enough to a normal model. (If n was much under 30, you would be unable to say anything about the shape and you would be unable to solve part (b).)

Solution (b): Please refer to How to Work Problems, above. You’ve already described the distribution, so the next step is to make the sketch. You may be tempted to skip this step, but it’s an important reality check on the numerical answer you get from your calculator. The sketch for this problem is shown at right. Please observe these key points when sketching sampling distribution problems:

1. Draw the axis line.
2. Label the axis, or as appropriate.
3. Draw a vertical line in the middle of the distribution and show the numerical value of the mean. Caution! This is the mean of the sampling distribution, equal to the population mean, not the sample mean.
4. Draw a horizontal line at about the right spot and show the numerical value of the SEM, not σ of the original population. (For Binomial Data, below, you’ll use the SEP instead of the SEM.)
5. Draw a line and show the value for each boundary.
6. Shade the area you’re trying to find, and estimate it. (From the sketch for this problem, I estimated a few percent, definitely under 10%.)
7. (optional) After you find the area, show its value.

Next, compute the probability on your calculator.

Press [`2nd` `VARS` makes `DISTR`] [`2`] to select `normalcdf`. Fill in the arguments, either on the wizard interface or in the function itself. Either way, you need four arguments, in this order:

• Left boundary. In this case, there is no left boundary because the problem specifies ≤\$189.56. Conceptually, the boundary is −∞, but your calculator doesn’t have an infinity key, so use (−)10^99 instead. (Don’t use 0. Yes, 0 is the lower limit for a deposit, but you’re using the normal model for the sampling distribution, so the tails go on forever.)
• Right boundary. For this problem, 189.56 is the right boundary.
• Mean. 200 in this problem.
• Standard error. You computed it earlier as \$6.36, but that’s an approximate number. Never use rounded numbers in further calculations. `normalcdf` calculations are particularly sensitive to rounding errors, especially when one or both boundaries are out in the tails, so use the exact value: 45/√50.
With the “wizard” interface: With the classic interface:
The wizard prompts you for a standard deviation σ. Don’t enter the SD of the population. Do enter the SD of the sampling distribution, which is the standard error. After entering the standard error, press [`ENTER`] twice and your screen will look like the one at right.

After entering the standard error, press [`)`] [`ENTER`]. You’ll have two closing parentheses, one for the square root and one for `normalcdf`. Always show your work. There’s no need to write down all your keystrokes, but do write down the function and its arguments:

normalcdf(−10^99, 189.56, 200, 45/√50)

Answer: P( ≤ 189.56) = 0.0505

Comment: Here you see the power of sampling. With a standard deviation of \$45.00, an individual deposit of \$189.56 or lower can be expected almost 41% of the time. But a sample mean under \$189.56 with n=50 is much less likely, only a little over 5%.

This is one reason you should take the trouble to make your sketch reasonably close to scale. If you enter the standard deviation, 45, instead of the standard error, 45/√50 — a common mistake — you’ll get 0.4083. A glance at your sketch will tell you that can’t possibly be right, so you then know to find and fix your mistake.

#### Example 2: Women’s Heights

US women’s heights are normally distributed (ND), with mean 65.5″ and standard deviation 2.5″. You visit a small club on a Thursday evening, and 25 women are there. (Let’s assume they are a representative sample.) Your pickup line is that you’re a statistics student and you need to measure their heights for class. Amazingly, this works, and you get all 25 heights. How likely is it that the average height is between 65″ and 66″?

Solution: First, get the characteristics of the sampling distribution:

• Center: The mean of the sampling distribution is 65.5″, the same as the mean of the original population.
• Spread: The standard deviation of the sampling distribution (standard error of the mean or SEM) is σ/√n = 2.5/√25 = 0.5″.
• Shape: The sample is representative of all women (we assume), and 10n = 10×25 = 250 is less than the total number of women. The sample size is under 30, but the original population is a ND and therefore the sampling distribution is also ND. If the SEM is 0.5″, then 65″ and 66″ equal the mean ± one standard error. The Empirical Rule (68–95–99.7 Rule) tells you that about 68% of the data fall between those bounds. In this problem, the sketch is a really good guide to the answer you expect. This is the distribution of sample means, so you expect 68% of them to fall between those bounds. But do the computation anyway, because the Empirical Rule is approximate and now you’re able to be precise. Also, the SEM of 0.5″ is an exact number, but still I put the whole computation into the calculator just to be consistent.

The chance that the sample mean is between 65″ and 66″ is

P(65 ≤ ≤ 66) = 0.6827 Remember the difference between the distribution of sample means and the distribution of individual heights. From the computation at the right, you expect to see under 16% of women’s heights between 65″ and 66″, versus over 68% of sample mean heights (for n=25) between 65″ and 66″. That’s the whole point of this chapter: sample means stick much closer to the population mean.

#### Example 3: Elevator Load Limit

Suppose hotel guests who take elevators weigh on average 150 pounds with standard deviation of 35 pounds. An engineer is designing a large elevator, to lift 50 people. If she designs it to lift 4 tons (8000 pounds), what is the chance a random group of 50 people will overload it?

Need a hint? This is a problem in sample total. You haven’t studied that kind of problem, but you have studied problems in sample means. In math, when you have an unfamiliar type of problem, it’s always good to ask: Can I change this into some type of problem I do know how to solve? In this case, how do you change a problem about the total number of pounds in a sample (∑x) into a problem about the average number of pounds per person ()?

Solution: To convert a problem in sums into a problem in averages, divide by the sample size. If the total weight of a sample of 50 people is 8000 lb, then the average weight of the 50 people in the sample is 8000/50 = 160 lb. So the desired probability is P( > 160): P(∑x > 8000 for n = 50) = P( > 160)

And you know how to find the second one.

What does the sampling distribution of the mean look like for μ = 150, σ = 35, n = 50? The mean is μ = 150 lb, and the standard error is 35/√50 ≈ 4.9 lb. That’s all you need to draw the sketch at right. Samples are random, 10×50 = 500 is less than the number of people (or potential hotel guests) in the world, and n = 50 ≥ 30, so the sampling distribution follows a normal model.

Now make your calculation. This time the left boundary is a definite number and the right boundary is pseudo infinity, 10^99. And again, you want the standard error, not the SD of the original population.

With the “wizard” interface: With the classic interface: After entering the standard error, press [`ENTER`] twice, and your screen will look like the one at right.

After entering the standard error, press [`)`] [`ENTER`]. Show your work: normalcdf(160, 10^99, 150, 35/√50).

There’s a 0.0217 chance that any given load of 50 people will overload the elevator. That’s not 2% of all loads, but 2% of loads of 50 people. Still, it’s an unacceptable level of risk.

Is there an inconsistency here? Back in Chapter 5, I said that an unusual event was one that had a low probability of occurring, typically under 5%. Since 2% is less than 5%, doesn’t that mean that an overloaded elevator is an unusual event, and therefore it can be ignored?

Yes, it’s unusual. But no, fifty people plunging to a terrible death can’t be ignored. The issue is acceptable risk. Yes, there’s some risk any time you step in an elevator that it will be your last journey. But it’s a small risk, and it’s one you’re willing to accept. (The risk is much greater every time you get into a car.) Without knowing exact figures, you can be sure it’s much, much less than 2%; otherwise every big city would see many elevator deaths every day.

In Chapter 10, you’ll meet the significance level, which is essentially the risk of being wrong that you can live with. The worse the consequences of being wrong, the lower the acceptable risk. With an elevator, 5% is much too risky — you want crashes to be a lot more unusual than that.

## 8B.  Binomial Data / Proportions of Samples

Binomial data are yes/no or success/failure data. Each sample yields a count of successes. (A reminder: “success” isn’t necessarily good; it’s just the name for the condition or response that you’re counting, and the other one is called “failure”.)

Need a refresher on the binomial model? Please refer back to Chapter 6.

The summary statistic or parameter is a proportion, rather than a mean. In fact, the proportion of success (p) is all there is to know about a binomial population.

In Chapter 6 you computed probabilities of specific numbers of successes. Now you’ll look more at the proportions of success in all possible samples from a binomial population, using the normal distribution (ND) as an approximation.

Here’s a reminder of the symbols used with binomial data:

p The proportion in the population. Example: If 83% of US households have at least one cell phone, then p = 0.83. Remember “proportion of all equals probability of one”, so p is also the probability that any randomly selected response from the population will be a success. = 1−p is therefore the proportion of failure or the chance that any given response will be a failure. The sample size. The number of successes in the sample. Example: if 45 households in your sample have at least one cell phone, then x = 45. “p-hat”, the proportion in the sample, equal to x/n. Example: If you survey 60 households and 45 of them have at least one cell phone, then p̂ = 45/60 = 0.75 or 75%.

### 8B1.  Sampling Distribution of p̂

The sampling distribution of the proportion is the same idea as the sampling distribution of the mean, and there are a lot of parallels between the two. (A table at the end of this chapter summarizes them.)

Definition: Imagine you take a whole lot of samples from the same population. Each sample has n success/failure data points, and you compute the sample proportion of each of them. All those ’s form a new data set, which can be called the distribution of sample proportions, or the sampling distribution of the proportion, or the sampling distribution of , for sample size n.

As before, n is the size of each sample, not the number of samples. There’s no symbol for the number of samples, because it’s indefinitely large.

One change from the sampling distribution of is that the sampling distribution of is a different data type from the population. The original data are non-numeric (yeses and noes), but the distribution of is numeric because the ’s are numbers. Each says “so many percent of this sample were successes.”

#### Center of the Sampling Distribution of p̂

Summary:

The mean of the sampling distribution of equals the proportion of the population: μ = p (“mu sub p-hat equals p”).

This is true regardless of the proportion in the original population and regardless of sample size.

Why is this true? The reasons are similar to the reasons in Center of the Sampling Distribution of . for a given sample may be higher or lower than p of the population, but if you take a whole lot of samples then the high and low ’s will tend to cancel each other out, more or less.

#### Spread of the Sampling Distribution of p̂

Summary:

The standard deviation of the sampling distribution of has a special name: standard error of the proportion or SEP; its symbol is σ (“sigma-sub-p-hat”). The standard error of the proportion for sample size n equals the square root of the population proportion, times 1 minus the population proportion, divided by the sample size: SEP or σ = √pq/n.

This is true regardless of the proportion in the original population and regardless of sample size.

Why is this true? For a binomial distribution with sample size n, the standard deviation is npq. That is the SD of the random variable x, the number of successes in a sample of size n. The sample proportion, random variable , is x divided by n, and therefore the SD of is the SD of random variable x, also divided by n. In symbols, σ = √npq / n = √npq/n² = √pq/n.

#### Shape of the Sampling Distribution of p̂

Summary:

If np and nq are both ≥ about 10, and 10n ≤ N, the normal model is good enough for the sampling distribution.   Let’s look at some sampling distributions of . First I’ll show you the effect of the population’s proportion of success p, and then the effect of the sample size n.

Using @RISK from Palisade Corporation, I simulated all of the sampling distributions shown here. The mathematical sampling distribution has an indefinitely large number of samples, but I stopped at 10,000.

These first three graphs show the sampling distributions for samples of size n = 4 from three populations with different proportions of successes.

Reminder: these are not graphs of the population — they’re not responses from individuals. They are graphs of the sampling distributions, showing the proportion of successes () found in a lot of samples.

How do you read these? For example, look at the first graph. This shows the sampling distribution of the proportion for a whole lot of samples, each of size 4, where the probability of success on any one individual is 0.1. You can see that about 67% of all samples have  = 0 (no successes out of four), about 29% have  = .25 (one success out of four), about 4% have  = .50 (two successes out of four), and so on.

Why the large gaps between the bars? With n = 4, each sample can have only 0, 1, 2, 3, or 4 successes, so the only possible proportions for those samples are 0, 25%, 50%, 75%, and 100%.

But let’s not obsess over the details of these graphs. I’m more interested in the shapes of the sampling distributions.

What do you see? If you take many samples of size 4 from a population with p = 0.1 (10% successes and 90% failures), the sampling distribution of the proportion is highly skewed. Now look at the second graph. When p = .25 (25% successes and 75% failures in the population), again with n = 4 individuals in each sample, the sample proportions are still skewed, but less so. And in the third graph, where the population has p = 0.5 (success and failure equally likely), then the sampling distribution is symmetric even with these small samples.

For a given sample size n, it looks like the closer the population p is to 0.5, the closer the sampling distribution is to symmetric. And in fact that’s true. That’s your take-away from these three graphs.   Now let’s look at sampling distributions using different sample sizes from the same population. I’ll use a population with 10% probability of success for each individual (p = 0.1).

You’ve already seen the graph of the sampling distribution when n = 4. The three graphs here show the sampling distribution of for progressively larger samples. (Remember always that n is the number of individuals in one sample. The number of samples is indefinitely large, though in fact I took 10,000 samples for each graph.)

What do you see here? The distribution of ’s from samples of 50 individuals is still noticeably skewed, though a lot less than the graph for n = 4. If I take samples of size 100, the graph is starting to look nearly symmetric, though still slightly skewed. And if I take samples of 500 individuals, the distribution of looks like a bell curve.

What do you conclude from these graphs? First, even if p is far from 0.5 (if the population is quite unbalanced), with large enough samples, the sampling distribution of is a normal distribution. Second, you need big samples for binomial data. Remember that 30 is usually good enough for numeric data. For binomial data, it looks like you need bigger samples.

Okay, let’s put it together. If the size of each sample is large enough, the sampling distribution is close enough to normal. How large a sample is large enough? It depends on how skewed the original population is, which means it depends on the proportion of successes in the population. The further p is from 0.5, the more unbalanced the population and the larger n must be.

How big a sample is big enough? Here’s what some authors say:

Why the disagreements? They can’t all be right, can they?

Actually, they can. The question is, what’s close enough to a ND? That’s a judgment call, and different statisticians are a little bit more or less strict about what they consider close enough. Fortunately, with samples bigger than a hundred or so, which are customary, all the conditions are usually met with room to spare.

We’ll follow DeVeaux and his buddies and use np ≥ 10 and nq ≥ 10. This is easy to remember: at least ten “yes” and at least ten “no” expected in a sample. (You can compute the expected number of noes as nq = n(1−p) or simply nnp, sample size minus the expected number of yeses.)

How does this work out in practice? Look at the next-to-last graph, with n=100 and p=0.1. It’s close to a bell curve, but has just a little bit of skew. (It’s easier to see the skew if you cover the top part of the graph.)

Check the requirements: np = 100×.1 = 10, and nq = 100−10 = 90. In a sample of 100, 10 successes and 90 failures are expected, on average. This just meets requirements. And that matches the graph: you can see that it’s not a perfect bell curve, but close; but if it was a little more skewed then the normal model wouldn’t be a good enough fit.

De Veaux and friends (page 440) give a nice justification for choosing ≥ 10 yeses and noes. Briefly, the ND has tails that go out to ±infinity, but proportions are between 0 and 1. They chose their “success/failure condition”, at least ten of each, so that the mismatch between the binomial model and the normal model is only in the rare cases.

But there’s an additional condition: the individuals in the sample must be independent. This translates to a requirement that the sample can’t be too large, or drawing without replacement would break the binomial model. Big surprise (not!): Authors disagree about this too. For example, De Veaux and Johnson & Kuby say sample can’t be bigger than 10% of population (n ≤ 0.1N); Sullivan says 5%.

We’ll use n ≤ 0.1N, just like with numeric data. And just as before, you can think of that as 10n ≤ N when you don’t have the exact size of the population.

Example 4: You asked 300 randomly selected adult residents of Ithaca a yes-or-no question. Is the sample too large to assume independence? You may not know the population of Ithaca, but you can compute 10×300 = 3000 and be confident that there are more than 3000 adult Ithacans. Therefore your sample is not too large.

Don’t just claim 10n ≤ N. Show the computation, and identify the population you’re referring to, like this: “10n = 10×300 = 3000 ≤ number of adult Ithacans.”

Remember to check your conditions: np ≥ about 10, nq ≥ about 10, and 10nN. And of course your sample must be random.

#### Requirements, Assumptions, and Conditions

Just like with numeric data, you might find it helpful to name the requirements for binomial data. These are the same requirements that I just gave you, but presented differently. I’m following DeVeaux, Velleman, Bock (2009, 493) [see “Sources Used” at end of book].

Independence Condition, Randomization Condition, 10% Condition: These are the same for every sampling distribution and every procedure in inferential stats. I’ve already talked about them under numeric data earlier in the chapter. In practice, the 10% Condition comes into play more often for binomial data than numeric data, because binomial samples are usually much larger.

Sample Size Assumption: For binomial data, the sample is like Goldilocks and porridge — it can’t be too big and it can’t be too small. (Maybe it was beds or chairs and not porridge? And what the heck is porridge?) “Too big” is checked by the 10% Condition; “too small” is checked by the

• Success/Failure Condition: The more lopsided the population is — the further the population proportion is from 50% — the larger sample you need for the sampling distribution to be close enough to a normal model. Our rule of thumb is that your sample needs to be big enough that you expect ≥ 10 successes and ≥ 10 failures based on the population proportion p.

See also: Is That an Assumption or a Condition? (Bock [see “Sources Used” at end of book]). Again, these are the same requirements you see in this textbook, just presented differently.

### 8B2.  Applications

#### How to Work Problems

Working with the sampling distribution of , the technique is exactly the same as for problems involving the sampling distribution of . Follow these steps:

1. Determine center, spread, and shape of the sampling distribution, even if you’re not explicitly asked to describe the distribution.
2. If you can’t show that the sampling distribution is ND, stop!
3. Sketch the curve, and estimate the answer. (See example below.)
4. Compute the probability (area) using `normalcdf`. Caution! Don’t use rounded numbers in this calculation.
The ND is continuous and goes out to ±infinity, but the binomial distribution is discrete and bounded by 0 and n. If the requirements are met (at least 10 successes and 10 failures expected), the normal model is a good fit near the middle of the distribution. The fit is usually good enough in the tails, but not as good as it is in the middle.

Because of this, some authors apply a continuity correction to make the normal model a better fit for the binomial. This means extending the range by half a unit in each direction. For example, if n = 100 and p = 0.20, and you’re finding the probability of 10 to 15 successes, MATH200A part 3 gives a probability of 0.1262. The normal model with standard error √.20×(1−.20)/100 = 0.04 gives

normalcdf(10/100, 15/100, .2, .04) = 0.0994

With the continuity correction, you compute the probability for 9½ to 15½ successes. Then the normal model gives a probability of

normalcdf(9.5/100, 15.5/100, .2, .04) = 0.1260

This is a better match to the exact binomial probability. Why use the normal model at all, then? Why not just compute the exact binomial probability? Because there’s only a noticeable discrepancy far from the center, and only when the sample is on the small side. (100 is a small sample for binomial data, as you’ll see in the next two chapters.) You can apply the continuity correction if you want, but many authors don’t because it usually doesn’t make enough difference to matter.

#### Example 5: Swain v. Alabama

1965. Talladega County, Alabama. An African American man named Robert Swain is accused of rape. The 100-man jury panel includes 8 African Americans, but through exemptions and peremptory challenges none are on the final jury. Swain is convicted and sentenced to death. (Juries are all male. 26% of men in the county are African American.)

(a) In a county that is 26% African American, is it unexpected to get a 100-man jury panel with only eight African Americans?

Solution: “Unexpected”, “unusual”, or “surprising” describes an event with low probability, typically under 5%. This problem is asking you to find the probability of getting that sample and compare that probability to 0.05.

This is binomial data: each member of the sample either is or is not African American. Putting the problem into the language of the sampling distribution, your population proportion is p = 0.26. You’re asked to find the probability of getting 8 or fewer successes in a sample of 100, so n = 100 and your sample proportion must be  ≤ 8/100 or  ≤ 0.08.

Why “8 or fewer”?  = 8% for the questionable sample, and you think it’s too low since it’s below the expected 26%. Therefore, in determining how likely or unlikely such a sample is, you’ll compute the probability of  ≤ 0.08.

First, describe the sampling distribution of the proportion:

• Center: μ = p = 0.26
• Spread: σ = √pq/n = √.26(1−.26)/100 = 0.044
• Shape: For the sampling distribution to be normal, you have four requirements to check:
• Random sample? That’s the county’s claim. Check.
• Sample not too large? 10n = 10×100 = 1000. We don’t know how many men are in the county, but it must be more than a thousand. Check.
• Expected number of successes? np = 100×.26 = 26 ≥ 10. Check. (Use 0.26, not 0.08. The sampling distribution is based on the population proportion p, not on any particular sample proportion.)
• Expected number of failures? nq = 100−26 = 74 ≥ 10. Check.

Therefore the sampling distribution of the proportion is a ND. Next, make the sketch and estimate the answer. I’ve numbered the key points in the sketch at right, but if you need a refresher please refer back to the sketch under Example 1.

From this sketch, you’d expect the probability to be very small, and indeed it is.

Compute the probability using `normalcdf` as before. Be careful with the last argument, which is the standard deviation of the sampling distribution. Don’t use a rounded number for the standard error, because it can make a large difference in the probability.

With the “wizard” interface: With the classic interface:
The standard error expression, √.26*(1−.26)/100, scrolls off the screen as you type it in, so be extra careful! Press [`ENTER`] twice, and your screen will look like the one at right.

Press [`)`] [`ENTER`] after entering the standard error. You’ll have two closing parentheses, one for the square root and one for `normalcdf`. Always show your work — not keystrokes but the function and its arguments:

normalcdf(−10^99, .08, .26. √(.26*(1−.26)/100)) The SEP is a nasty expression, and you have to enter it twice in every problem. You might like to save some keystrokes by computing it once and then storing it in a variable, as I did at the right. When you’re drawing the sketch and need the standard error, compute it as usual but before pressing [`ENTER`] press [`STO→`] [x,T,θ,n]. Then when you need the standard error in `normalcdf`, in the wizard or the classic interface, just press the [x,T,θ,n] key instead of re-entering the whole SEP expression. The probability is naturally the same whether you use the shortcut or not.

P( ≤ 0.08) = 2.0×10-5, or P(x ≤ 8) = 0.000 020. There are only 20 chances in a million of getting a 100-man jury pool with so few African Americans by random selection from that county’s population. This is highly unexpected — so unlikely that it raises the gravest doubts about the county’s claim that jury pools were selected without racial bias.

You might remember that in Chapter 6 you computed this binomial probability as 0.000 005, five chances in a million. If the ND is a good approximation, why does it give a probability that’s four times the correct probability? Answer: The normal approximation gets a little dicier as you move further out the tails, and this sample is pretty far out (z = −4.10). But is the approximation really that bad? Sure, the relative error is large, but the absolute error is only 0.000 015, 15 chances in a million. Either way, the message is “This is extremely unlikely to be the product of random chance.” (b) From 1950 to 1965, as cited in the Supreme Court’s decision, every 100-man jury pool in the county had 15 or fewer African Americans. How likely is that, if they were randomly selected?

Solution: 15 out of 100 is 15%. You know how to compute the probability that one jury pool would be ≤15% African American, so start with that. You’ve already described the sampling distribution, so all you have to do is make the sketch and then the calculation. Everything’s the same, except your right boundary is 0.15 instead of 0.08.

If you use my little shortcut: Otherwise:  Either way, P( ≤ 0.15) = 0.0061. The Talladega County jury panels are multiple samples with n = 100 in each, so the “proportion of all” interpretation makes sense: In the long run, you expect 0.61% of jury panels to have 15% or fewer African Americans, if they’re randomly selected.

But actually 100% of those jury panels had 15% or fewer African Americans. How unlikely is that? Well, we don’t know how many juries there were in the county in those 16 years, but surely it must have been at least one a year, or a total of 16 or more. The probability that 16 independent jury pools would all have 15% or fewer African Americans, just by chance, is 0.006074559116 ≈ 3E-36, effectively zip. And if there was more than one jury a year, as there probably was, the probability would be even lower. Something is definitely fishy.

The binomial probability is 0.0061 also. This is still pretty far out in the left-hand tail (z = −2.51), but the normal approximation is excellent. The message here is that the normal approximation is pretty darn close except where the probabilities are so small that exactness isn’t needed anyway.

## 8C.  Summary of Sampling Distributions

Here’s a side-by-side summary of sampling distributions of the mean (numeric data) and sampling distributions of the proportion (binomial data). Always check requirements for the type of data you actually have!

Numeric Data Binomial Data Each individual in sample provides a number. Each individual in sample provides a success or failure, and you count successes. mean x̅ = ∑x/n proportion p̂ = x/n mean μ proportion p Sampling distribution of the mean (sampling distribution of x̅) Sampling distribution of the proportion (sampling distribution of p̂) μx̅ = μ μp̂ = p SEM = standard error of the mean SEP = standard error of the proportion σx̅ = σ/√n σp̂ = √pq/n Random sample 10n ≤ N Population is ND or n ≥ about 30 Random sample 10n ≤ N np ≥ about 10 and nq ≥ about 10 NOTE: n is number of individuals per sample. Number of samples is indefinitely large and has no symbol.

## What Have You Learned?

Key ideas:

• The sampling distribution of the mean (sampling distribution of ) and the sampling distribution of the proportion (sampling distribution of ) are concepts, not something you ever construct in reality.
• or is a random variable, varying with each sample.
• The size of each sample is n. The distribution contains an indefinitely large number of samples of size n. (There’s no symbol for the number of samples in the distribution.)
• Treat all those sample means or sample proportions as a new data set and mentally draw a histogram. This is a picture of the sampling distribution.
• The Central Limit Theorem: the closer to normal the original population, and the larger the sample, the closer the sampling distribution will be to a ND. For numeric data, n ≥ 30 is almost always good enough. For binomial data, it’s more complicated.
• Describing a sampling distribution means giving its center, spread, and shape.
• For any data type, the sampling distribution describes random samples that aren’t too large, not more than 10% of population.
• For numeric data, the sampling distribution of (sampling distribution of the mean) has these properties:
• The center of the sampling distribution (mu sub x-bar, μ) always equals the mean of the population (μ).
• The standard deviation of the sampling distribution (sigma sub x-bar, σ, also known as the standard error of the mean or SEM) always equals the standard deviation of the population divided by the square root of the sample size, σ/√n.
• If n ≥ about 30, or population is ND, then the shape of the sampling distribution is close enough to normal. If requirements are not met, you generally can’t say anything useful about the shape, and you can’t use the normal model.
• For binomial data, the sampling distribution of (sampling distribution of the proportion) has these properties:
• The center of the sampling distribution (mu sub p-hat, μ) always equals the proportion in the population (p).
• The standard deviation of the sampling distribution (sigma sub p-hat, σ, also known as the standard error of the proportion or SEP) always equals √pq/n.
• If there are at least 10 successes and 10 failures expected per sample — if np ≥ 10 and nq ≥ 10 — then the shape of the sampling distribution is close enough to normal. If requirements are not met, you generally can’t say anything useful about the shape, and you can’t use the normal model.
• Given μ and σ of a numeric population, or p of a binomial population, find the probability of a specified sample. To do this,
1. Check requirements. If they’re not met, stop.
2. Compute the standard error and make a sketch to scale.
3. Use `normalcdf` to compute probability. In `normalcdf`, the fourth argument is the unrounded standard error, not the population standard deviation.
Because this textbook helps you,
Because this textbook helps you,
BrownMath.com/donate.
Study aids:

## Exercises for Chapter 8

Write out your solutions to these exercises, making a sketch and showing your work for all computations. Then check your solutions against the solutions page and get help with anything you don’t understand.

Caution! If you don’t see how to start a problem, don’t peek at the solution — you won’t learn anything that way. Ask your instructor or a tutor for a hint. Or just leave it and go on to a different problem for now. You may find when you return to that “impossible” problem that you see how to do it after all.

1 Household incomes in the country Freedonia are a skewed distribution with mean \$48,000 and standard deviation (SD) \$2,000. You take a random sample of size 64 and compute the mean of the sample. That is one sample mean out of the distribution of all possible sample means. Describe the sampling distribution of the mean, including all symbols and formulas.
2 A manufacturer of light bulbs claims a mean life of 800 hours with SD 50 hours. You take a random sample of 100 bulbs and find a sample mean of 780 hours.

(a) If the manufacturer’s claim is true, is a sample mean of 780 hours surprising? (Hint: Think about whether you need the probability of  ≤ 780 or  ≥ 780.)

(b) Would you accept the manufacturer’s claim?

3 Suppose 72% of Americans believe in angels, and you take a simple random sample of 500 Americans.

(a) Describe the sampling distribution of the proportion who believe in angels in samples of 500 Americans.

(b) Use the normal approximation to compute the probability of finding that 350 to 370 in a sample of 500 believe in angels. Reminder: You can’t use the sample counts directly; you have to convert them to sample proportions.

4 In a town with 100,000 households, the last census showed a mean income of \$32,400 with SD \$19,000. The city manager believes that average income has fallen since the census. Students at the local community college randomly survey 1000 households and find a sample mean income of \$31,000. What’s the chance of getting a sample mean ≤\$31,000 if the true mean and SD are still what the census found?
5 Roulette is a popular casino game. The croupier spins the wheel in one direction and spins a white ball in the other direction along the rim, and the ball drops into one of the slots. In the US, roulette wheels have 38 slots: 18 red, 18 black, and 2 green. (In Monte Carlo, the wheels have 37 slots because there’s only one green.)

(a) One way beginners play is to bet on red or black. If the ball comes up that color, they double their money; if it comes up any other color, they lose their money. Construct a probability model for the outcome of a \$10 bet on red from the player’s point of view.

(b) Find the mean and SD for the outcome of \$10 bets on red, and write a sentence interpreting the mean.

(c) Now take the casino’s point of view. A large casino can have hundreds of thousands of bets placed in a day. Obviously they won’t all be same, but it doesn’t take many days to see a whole lot of any given bet. Describe the sampling distribution of the mean for a sample of 10,000 \$10 bets on red.

(d) How much does the casino expect to earn on 10,000 \$10 bets on red?

(e) What’s the chance that the casino will lose money on those 10,000 \$10 bets on red?

(f) What’s the casino’s chance of making at least \$2000 on those 10,000 \$10 bets?

6 A sugar company packages sugar in 5-pound bags. The amount of sugar per bag varies according to a normal distribution. A random sample of 15 bags is selected from the day’s production. If the total weight of the sample is more than 75.6 pounds, the machine is packing too much per bag and must be adjusted.

What is the probability of this happening, if the day’s mean is 5.00 pounds and SD 0.05 pounds?

7 The weights of cabbages in a shipment are normally distributed, with a mean of 38.0 ounces and SD of 5.1 ounces.

(a) If you randomly pick one cabbage, what is the probability that its weight is more than 43.0 ounces?

(b) If you randomly pick 14 cabbages, what is the probability that their average weight is more than 43.0 ounces?

8 Suppose the average household consumes 12.5 KW of electric power at peak time, with SD 3.5 KW. A particular substation in a typical neighborhood serves 1000 households and has a capacity of 12,778 KW at peak time. (That’s 12 thousand and some, not 12 point something.) Find the probability that the substation will fail to supply enough power.
9
HeartAttack NoAttack Total p̂ 189 10845 11034 1.71% 104 10933 11037 0.94%
In the Physicians’ Health Study, about 22,000 male doctors were randomly assigned to take aspirin daily or a placebo daily. (Of course the study was double blind.) In the placebo group, 1.71% of doctors had heart attacks over the course of the study. Let’s take 1.71% as the rate of heart attacks in an adult male population that doesn’t take aspirin.

The heart attack rate among aspirin takers was 0.94%, which looks like an impressive difference. Is there any chance that aspirin makes no difference, and this was just the result of random selection? In other words, how likely is it for that second sample to have  = 0.94% if the true proportion of heart attacks in adult male aspirin takers is actually 1.71%, no different from adult males who don’t take aspirin?

10 Men’s heights are normally distributed with mean 69.3″ and SD 2.92″. If a random sample of 16 men is taken, what values of the sample mean would be surprising? In other words, what values of are in the 5% of the sampling distribution furthest away from the population mean?

(Hint: The 5% is the tails, the part of the sampling distribution that is not in the middle 95%.)

11 In June 2013, the Pew Research Center found that 45% of Americans had an unfavorable view of the Tea Party. In the second week of October 2013, according to Tea Party’s Image Turns More Negative (Pew Research Center 2013e [see “Sources Used” at end of book]), 737 adults in a random sample of 1504 had an unfavorable view of the Tea Party.

In a population where 45% have an unfavorable view of the Tea Party, how likely is a sample of 1504 where 737 or more have an unfavorable view? Can you draw any conclusions from that probability?

## What’s New?

• 15 Nov 2021: Updated several external links.
• 29 Oct 2020: Converted page from HTML 4.01 to HTML5, and inproved the formatting of radicals.
• 25 June 2015: Corrected a typo, ≤ for ≥, thanks to Darlene Huff.
• 1 Apr 2015: Modified the exercise on belief in angels to include converting counts to proportions.
• (intervening changes suppressed)
• 24 Mar 2013: New document.