Stats without Tears
7. Normal Distributions
Updated 1 Aug 2019
(What’s New?)
Copyright © 2013–2024 by Stan Brown, BrownMath.com
Updated 1 Aug 2019
(What’s New?)
Copyright © 2013–2024 by Stan Brown, BrownMath.com
You met random variables back in Chapter 6. Any random variable has a single numerical value, determined by chance, for each outcome of a procedure. Discrete random variables are limited to specified values, usually whole numbers. But a continuous random variable can take any value at all, within some interval or across all the real numbers.
Just as discrete probability models are used to model discrete variables, continuous probability models are used to model continuous variables. Of course, because a continuous random variable has infinitely many possible values, you can’t make a table of values and probabilities as you could do for a discrete distribution. Instead, either there’s an equation, or just a density curve (below).
A probability model is often called a distribution, so you can say that a variable “is normally distributed” (ND), that it “is a normal distribution” (also ND), or that it “follows a normal probability model”.
There are lots of specialized continuous distributions, but the normal distribution is most important by a wide margin. Many, many reallife processes follow the normal model, and the ND is also the key to most of our work in inferential statistics.
This section will give you some concepts that are common to all continuous distributions, and the rest of the chapter will talk about special properties of the normal distribution and applications. In Chapter 8, you’ll apply the normal distribution to get a handle on the variation from one sample to the next.
In Chapter 2, you learned to graph continuous data by grouping the data in classes and making a histogram, like the one below left. This is wait times in a fastfood drivethrough, with time in minutes — not whole minutes, which would make a discrete distribution, but minutes and fractional minutes.
Any sample you might take has a finite number of data points, so you set up classes, place the data points in the classes, and then draw a histogram. The height of each bar is proportional to the frequency or relative frequency of that class.
But when you come to consider all the possible values of a continuous variable, you have an infinite number of data points. If you tried to assign them to classes, it would take you forever —literally! Instead, you draw a smooth curve, called a density curve, to show the possible values and how likely they are to occur. An example is shown above right.
The density curve is a picture of a continuous probability model. It doesn’t just represent the data in a particular sample, but all possible data for that variable — along with the probabilities of their occurrence, as you’ll see next.
Up to now, the height of a bar in a histogram has been the number of data points in that class, or the relative frequency of that class. But how do you interpret the height of a density curve?
Answer: you don’t! The height of the curve above any particular point on the x axis just doesn’t lend itself to a simple interpretation. You might think it would be the probability of that value occurring. But with infinitely many possible values, “what’s the likelihood of a wait time of exactly 4 minutes?” just isn’t a meaningful question, because what about 3.99997 minutes or 4.002 minutes?
What is meaningful is the probability within an interval, which equals the area under the curve within that interval. For example, in this illustration, the probability of a wait time of 6.4 to 9.5 minutes is 29.4%. In symbols,
P(6.4 ≤ x ≤ 9.5) = 29.4%
or
P(6.4 < x < 9.5) = 29.4%
That’s right — the probability is the same whether you include or exclude the endpoints of the interval.
This explains why the probability is the same whether you include or exclude either endpoint of the interval. The difference is the area of a “rectangle” whose height is the height of the density curve and whose width is the distance from a to a — which is zero. Thus the area of the “rectangle” is zero, and the probability of the random variable taking any particular value, exactly, is zero.
Since area equals probability, and total probability must be 1, total area must be 1. Every pdf — the height of every density curve — is scaled so that the integral from −∞ to +∞ is 1.
You can also have the probability for an interval with one boundary, < or ≤ some value like the picture at right, or > or ≥ some value. For example, 3.33 minutes is about 3 minutes and 20 seconds, so the probability of waiting up to 3 minutes and 20 seconds is 20.6%: P(x ≤ 3.33) = 20.6%.
The total area under any density curve equals the probability that the random variable will take any one of its possible values, which of course is 1, or 100%. So you can use the complement to say that the probability of waiting 3 minutes and 20 seconds or more (or, more than 3 minutes and 20 seconds) is 100−20.6% = 79.4%.
You remember from Interpreting Probability Statements in Chapter 5 that every probability can be interpreted as a probability of one or a proportion of all. For example, P(x > 3.33) = 79.4% can equally well be interpreted in two ways:
Which interpretation you use in a given situation depends on what seems simplest and most natural in the situation. Here, the “proportion of all” interpretation seems simpler. But you’re always free to switch to the other interpretation if it helps you in thinking about a situation.
Area = Probability of One = Proportion of All
Why study the normal distribution?
First, it’s useful on its own. Lots and lots of reallife distributions match the normal model: body temperature or blood pressure of healthy people, scores on most standardized tests, commute times on a given route, lifetimes of batteries or light bulbs, heights of men or women, weights of apples of a particular variety, measurement errors (in many situations), and on and on.
Second, through sampling, even nonND populations follow a normal model. You’ll use this model in inferential statistics to make statements about a whole population based on just one sample — look forward to learming this neat trick in Chapter 8.
Why is the ND so common? In real life, very few events have just one cause; most things are the result of many factors operating independently. It turns out that if you take a lot of independent random variables and add them up, their sum is ND. For example, your IQ score results from multiple genetic factors, countless occurrences in your education and your family life, even transient factors like how well you slept the night before the test. Most of these are independent of each other, so the result of adding them is a ND.
The normal distribution (ND) has the properties of other continuous distributions as listed earlier. In particular, area = probability, and the total area under the density curve is the total probability, which is 1. The ND also has these special properties:
A ND is completely described by its mean and SD. The mean locates the center of the curve, but has no effect on the shape. For example, here are three normal curves with μ = 0, 2, and 5 and σ = 4.
The standard deviation determines the shape of the curve, but has no effect on the location. Smaller SD means the data stick closer to the mean, so the peak is higher and the tails are shorter and fatter. Larger SD means the data vary more, so they spread out from the mean: the peak is lower and the tails are longer and thinner. The second picture shows are three normal curves with μ = 2 and σ = 2, 4, and 6. (The vertical scale is different from the first picture.)
All of this is the theoretical normal distribution. In fact, nothing in real life is perfectly ND, because nothing in real life has an infinite number of data points. When we say something is ND, we mean it’s a close match, not a perfect match. “Normally distributed” (or ND) is short for “using a normal distribution to model this data set, the calculations will come out close enough to reality.”
This is a lot like what you did in Chapter 3, when you computed the statistics of a grouped distribution. The statistics were only approximate, because of the simplification you introduced by grouping, but the approximation was good enough.
Now let’s get to some applications! There are two main categories: “forward” problems, where you have the boundaries and you have to find the area or probability, and “backward” problems, where you have a probability or area and you have to find the boundaries.
Wikipedia has a decent short summary of the history. In Jenny Kenkle’s talk on the normal distribution, slide 18 shows de Moivre’s approximation to the binomial distribution for large n, and how to get from that to the ND. And if you want an exhaustive treatment of the history, see Saul Stahl’s The Evolution of the Normal Distribution, originally published in Mathematics magazine, April 2006.
The name of Carl Friedrich Gauss is permanently coupled to the normal distribution — literally. Although Sir Francis Galton coined the term normal distribution in 1889, Karl Pearson called it the Gaussian distribution in 1905, and that’s still a recognized synonym.
The cdf, the area to the left of a given x, is the integral of that, just the same as finding the area under any curve to the left of a given x: . This integral doesn’t have a “closed form”, a finite sequence of basic algebraic operations, so it must be found by successive approximations. That’s what your calculator does with normalcdf and Excel does with NORM.DIST.
Summary: Make a sketch, estimate the probability (area), then compute it.
TI83/84/89:
Use normalcdf(
left bound,
right bound, mean, SD)
. I’ll
walk you through the TI83/84 keystrokes in the first example below.
If you have a TI89, press [CATALOG
] [F3
] [plain 6
makes N
]
[ENTER
].
Excel:
In Excel 2010 or later, use (deep breath here)
=NORM.DIST(
right bound, mean, SD,
TRUE) − NORM.DIST(
left bound, mean, SD,
TRUE)
. In Excel 2007 or earlier, it’s
NORMDIST
rather than NORM.DIST
.
Example 1: Heights of human children of a given age and sex are ND. One study found that threeyearold girls’ heights have a mean of 38.72″ and SD of 3.17″. What percentage of threeyearold girls are 35″ to 40″ tall?
Solution: Take the time to make a sketch. It doesn’t have to be beautiful, but you should make it as accurate as you reasonably can. It’s an important safeguard against making boneheaded mistakes. Here’s what should be on your sketch:
Important: When you marked the SD, you set the scale for the sketch. Now you have to honor that and place your boundaries in proportion. For instance, in this problem the mean is 38.72 and the left boundary is 35, which is 3.72 below the mean. Your left boundary therefore needs to be a bit more than one SD (3.17) left of the mean. The right bound is 40, which is 1.28 above the mean, so your line needs to be just over a third of a SD to the right of the mean.
(Students often put in more numbers and lines, like the values of 1, 2, and 3 SD above and below the mean. That’s not wrong, but it’s usually not helpful, and it definitely clutters up the sketch.)
From my sketch, I estimate an area of 50%–60%. If it’s 45% or 70% I won’t be terribly surprised, but if it’s 5% or 99% I’ll know something is wrong.
If you wish, add that number to your sketch — not below the axis, please. Write it within the shaded area, if there’s room, or as a callout to the left or right of the diagram, the way I did here.
On a TI83 or TI84, press [2nd
VARS
makes DISTR
] [2
] to
select normalcdf
. Enter the left boundary (35), right
boundary (40), mean (38.72), and SD (3.17).
(If you have a TI89 or you’re using Excel, see above.)
With the “wizard” interface:  With the classic interface: 

Press [ 
After entering the standard deviation, press [) ] [ENTER ] to get
the answer.

You always need to show your work, so write down
normalcdf(35,40,38.72,3.17)
before you proceed to the
answer. (There’s no need to write down the keystrokes you
used.)
In this book, I round probabilities to four decimal places, or two decimal places if expressed as a percentage. The probability is
P(35 ≤ x ≤ 40) = 0.5365
That number matches my estimate of 50%–60%.
But the problem asked for a percentage. (Always, always, always look back at the problem and make sure you’re answering the question that was actually asked.) The answer: 53.65% of threeyearold girls are 35″ to 40″ tall.
Example 2: A threeyearold girl is randomly chosen. Would it be unusual (unexpected, surprising) if she’s over 45″ tall?
In Chapter 5 you learned to call a lowprobability event unusual (a/k/a surprising or unexpected). The standard definition of unusual events is a probability below 0.05, so really this problem is just asking you to find the probability and compare it to 0.05.
Solution: The sketch is at right, and obviously the
probability should be small. The left boundary
is 45, but what’s the right boundary? The normal distribution
never quite ends, so the right boundary is ∞ (infinity).
TI89s have a key for ∞, but TI83s and TI84s don’t
and Excel doesn’t, so
use 10^99 instead. (That’s 10 to the 99th power; the
[^
] key on your TI calculator is between [CLEAR
] and
[÷
].)
Show your work:
P(x > 45) = normalcdf(45,10^99,38.72,3.17)
=
0.0238
That’s rounded from 0.0237914986, and it’s in line with my estimate of “small”. Now answer the question: There’s only a 2.38% chance that a randomly selected threeyearold girl will be over 45″ tall, so that would be unusual.
Example 3: For the same population, find and interpret P(x < 33).
Solution: The sketch is at right, and again the expected probability is small. The right boundary is 33, but what’s the left boundary? You might want to use 0, since no one can be under 0″ tall, but you could make the same argument for 1″ or 5″, so that can’t be right.
To locate the left boundary, remember that
you’re using a normal model to
approximate the data, and the normal distribution runs right out to
±∞. Therefore, the left boundary is
minus ∞ on a TI89, or minus 10^99 on a TI83/84. (Use the
[()
] key, not the [−
] subtraction key.)
P(x < 33) = normalcdf(10^99,33,38.72,3.17)
=
0.0356
The proportion of threeyearold girls under 33″ tall is 0.0356 or 3.56%; or, 3.56% of threeyearold girls are under 33″ tall. The other interpretation is the chance that a randomly selected threeyearold girl is under 33″ tall is 0.0356 or 3.56%.
Example 4: What’s the percentile rank of a threeyearold girl who is 33″ tall?
Solution: Long ago, in a galaxy called Numbers about Numbers, you learned the definition of percentiles. The percentile rank of a data point is the percentage of the data set that is ≤ that data point. So you need P(x ≤ 33). But that’s exactly what you computed in the previous example: 3.56%. So the 33″tall girl is between the third and fourth percentiles for her age group.
“That was P(x < 33), and for a percentile I need P(x ≤ 33)!” I hear you yell. But those two are equal. When we talked about density curves, near the beginning of this chapter, you learned that the area and probability are the same whether you include or exclude the boundary.
And this is why it doesn’t make much difference whether you define a percentile rank in terms of < or ≤, because the probability in a continuous distribution is the same either way.
Summary: Make a sketch, estimate the value(s), then compute the value(s).
TI83/84/89:
Use invNorm(
area to left,
mean, SD)
. I’ll
walk you through the TI83/84 keystrokes in the first example below.
If you have a TI89, press
[CATALOG
] [F3
] [plain 9
makes I
] [▼
3 times] [
ENTER
].
Excel:
In Excel 2010 or later, use
=NORM.INV(
area to left, mean, SD)
.
In Excel 2007 or earlier, it’s
NORMINV
rather than NORM.INV
.
Example 5: Blood pressure is stated as two numbers, systolic over diastolic. The World Health Organization’s MONICA Project (Kuulasmaa 1998 [see “Sources Used” at end of book]) reported these parameters for the US:
Systolic: μ = 120, σ = 15
Diastolic: μ = 75, σ = 11
Blood pressure in the population is normally distributed. The lowest 5% is considered “hypotensive”, according to Kuzma and Bohnenblust (2005, 103) [see “Sources Used” at end of book]. What systolic blood pressure would be considered hypotensive?
Solution: Always make a sketch for these problems. Your sketch is similar to the ones you made for the first group of problems, except that you use a symbol like x_{1} or “?” for the unknown boundary, and you write in the known area.
Always estimate your answer to guard against at least some errors. In the sketch, x_{1} looks like it’s not quite two SD left of the mean, so I’ll estimate a pressure of 95 to 100. (Okay, I cheated by using my calculator to make my “sketch”. But even with a real pencilandpaper sketch, you ought to be in the right ballpark.)
Now you’re ready to calculate.
TI89 or Excel users, please see the
instructions above. On your TI83 or TI84,
press [2nd
VARS
makes DISTR
] [3
] to
select invNorm
. Enter the area to the left of the point
you’re interested in (.05), the mean (120), and the SD (15).
With the “wizard” interface:  With the classic interface: 

Press [ 
After the standard deviation, press [) ] [ENTER ] to get the answer.

Show your work! Write down
invNorm(.05,120,15)
before you proceed to the
answer. (There’s no need to write down the keystrokes you
used.)
Answer: Systolic blood pressure (first number) under 95 would be considered hypotensive.
Example 6: The same source considers the top 5% “hypertensive”. What is the minimum systolic blood pressure that is hypertensive?
Solution: My “sketch” is at right. It’s mostly straightforward — the x_{1} boundary is between the 5% tail and the rest of the distribution.
But what’s up with the 1−0.05?
The problem asks you about the upper 5%, which is the area to the
right of the unknown boundary. But
invNorm
on the calculator, and NORM.INV
in
Excel, need area to left of the desired boundary.
The area to the left is the probability of
“not hypertensive”, and area is probability, so the area to
left is 1 minus the area to right, in this case 1−0.05.
Could you just write down 0.95? Sure, that would be correct. But if the area to right was 0.1627 you’d probably make the calculator compute 1 minus that for you, so why not be consistent?
x_{1} = invNorm(1−.05,120,15) = 144.6728044 → 145
(That’s actually a little liberal. Several sources that I’ve seen give 140 as the threshold.)
Example 7: Kuzma and Bohnenblust describe the middle 80% as “normal”. What is that range of systolic blood pressure?
This problem wants you to find two boundaries, lower and upper. You have to convert the 80% middle into two areas to left. Here’s how. If the middle is 80%, then the two tails combined must be 100−80% = 20%. But the curve is symmetric, so each tail must be 20/2 = 10%. Strictly speaking, I probably should have written that computation on the diagram, instead of just a laconic “0.1”, but it would take up a lot of space and the computation was easy enough. You’ll probably do the same — just be careful.
Once you have the areas squared away, the computation is simple enough:
x_{1} = invNorm(.1,120,15) = 100.7767265 → 101
x_{2} = invNorm(1−.1,120,15) = 139.2232735 → 139
Check: The boundaries of the middle 80% (or the middle any percent) should be equal distances from the mean. (100.776265+139.2232735)/2 = 120, so at least it’s consistent. Answer: Systolic b.p. of 101 to 139 is considered normal.
Example 8: What’s the 40th percentile for systolic blood pressure?
Sometimes the gods smile on us. The kth percentile is the value that is ≥ k% of the population, so k% is exactly the area to left that you need.
P_{40} = invNorm(.4,120,15) = 116.1997935 → 116
Definition: The standard normal distribution is a normal distribution with a mean of 0 and standard deviation of 1, sometimes written N(0,1).
The standard normal distribution is a picture of zscores of any possible realworld ND — more about that later.
The standard normal distribution lets you make computations that apply to all normal models, not just a particular model. You’ll see some examples shortly, but first —
The main point about the standard normal distribution is that it’s a standin for every ND from real life. How does this work? Well, if you take any real data set and subtract the mean from every data point, the mean of the new data set is 0. And if you then divide that data set by the standard deviation (which doesn’t change when you subtract a constant from every data point), then the SD of the newnew data set is 1.
But all you did with those manipulations was replace the numbers with zscores. Remember the formula: . The standard normal distribution is what you get when you convert any normal model to zscores.
The need to do normal computations the hard way has gone the way of the dinosaurs, but I think this history is why many stats books still use tables to do their computations. Inertia is a powerful force in textbooks!
The pdf and cdf functions for the standard normal distribution are what you get when you set μ=0 and σ=1 in the general equations for the ND: and . Again, the integral must be found by successive approximations. That’s where the tables in books come from, and it’s what your calculator does with normalcdf and Excel does with NORM.DIST.
I said above that the standard normal distribution lets you make statements about all normal models. What sort of statements? Well, the Empirical Rule for one.
Example 9: The Empirical Rule says that 68% of the population in a normal model lies within one SD of the mean. How good is the rule? In other words, what’s the actual proportion?
Solution: As usual, you start with a sketch. This is the standard ND, so the axis is z, not x. There’s no need to mark the mean or SD, because the z label identifies this as a standard normal distribution and therefore μ = 0 and σ = 1. Just label the boundaries.
Compute the probability the same way you’ve already learned. (Both Excel and the TIs have special procedures available for the standard normal distribution, but it’s not worth taking brain cells to learn them, when the regular procedures for the ND work just fine with N(0,1).)
P(−1 ≤ z ≤ 1) = normalcdf(−1,1,0,1) = .6826894809 → 68.27%
The Empirical Rule says 68% of the data are within z = ±1. Actually it’s about 68¼%, close enough.
Example 10: How many standard deviations must you go above and below the mean to take in the middle 50% of the data in a normal model?
Solution: This is similar to finding the middle 80% of blood pressures earlier, except now you’re making a statement about all normal models, not just a particular one.
Shading the middle 50% leaves 100−50 = 50% in the two tails combined, so each tail is 50/2 = 25%.
z_{1} = invNorm(.25,0,1) = −.6744897495 → −0.67
By symmetry, z_{2} must be numerically equal to z_{1} but have the opposite sign: z_{2} = 0.67.
50% of the data in any normal model are within about 2/3 of a SD of the mean. Since the bounds of the middle 50% of the data are Q1 and Q3, the IQR of any normal distribution is twice that, about one and a third standard deviations. More precisely, the IQR is 2×0.674 ≈ 1.35 times the SD.
There’s one special notation you’ll use when you compute confidence intervals in Chapter 9.
Definition: z_{area} or z(area), also known as critical z, is the zscore that divides the standard normal distribution such that the righthand tail has the indicated area.
This may seem a little weird, but really it’s just a recipe to specify a number. Compare with the square root of 48. That is the positive number such that, if you multiply it by itself, you get 48. Or consider π: the number that you get when you divide the circumference of a perfect circle by its diameter. Math is full of numbers that are specified as recipes. An example will make things clearer.
Example 11: Find z_{0.025}.
Solution: The problem is diagrammed at right. Caution! 0.025 is an area, not a zscore, so you don’t write 0.025 on the number line (the z axis). z_{0.025} is a zscore (though you don’t know its value yet), so it goes on the number line.
Once you have your sketch, the computation is straightforward.
Have area (probability), compute boundary.
The area is 0.025, but it’s an area to right, and
invNorm
needs an area to left, so you subtract from 1 as
usual:
z_{0.025} = invNorm(1−.025, 0, 1) = 1.959963986 → 1.96
Caution! You’re computing a boundary for the righthand tail. If you get a negative number, that can’t possibly be right.
z_{0.025} = 1.96 makes sense, if you think about it. If you also shaded in the lefthand tail with an area of 0.025, the two tails together would total 5%, leaving 95% in the middle. The Empirical Rule says that 95% of data are within 2 SD above and below the mean, and 1.96 is approximately 2.
How do you know whether a normal model is appropriate? How do you know whether your data are normally distributed? A histogram can rule out skewed data, or data with more than one peak.
But what if your data are unimodal and not obviously skewed? Is that enough to justify a normal model? No, it’s not. You need to perform a test called a normal probability plot. You’ll need this procedure in Chapters 8 through 11, whenever you have a small sample of numeric data.
That’s the bare outline, and you’ll get a little bit more with the examples. For those who want the full theory, it’s marked optional at the end of this section.
Testing for normality can be automated partly or completely, depending on what technology you have:
Example 12: Consider these vehicle weights (in pounds):
2500, 3250, 4000, 3500, 2900, 4500, 3800, 3000, 5000, 2200
Do they fit a normal model?
Solution:
Put the data in any statistics list,
then press [PRGM
], scroll down to MATH200A
, and
press [ENTER
] twice. Select Normality chk
.
The program makes the plot, and you can look at the points to determine whether they seem to be pretty much on a straight line. At least, that’s the theory. In practice, most data sets are a lot less clear cut than this one. It can be hard to tell whether the points fit a line, particularly if you have only a few of them. The plot takes up the whole screen, so deviations can look bigger than they really are.
Fortunately, there’s a test for whether points lie on a straight line. As you know from Chapter 4, the closer the correlation coefficient r is to 1, the closer the points are to a straight line.
The program computes r for you, and it also computes a critical value★ to help you determine if the points are close enough to a straight line. (For technical reasons, the critical value is different from the decision points of Chapter 4.) If r≥crit, it’s close enough to 1, the points are close enough to a straight line, and you can use a normal model. If r<crit, it’s too far from 1, the points are too far from a straight line, and you can’t use a normal model.
For this data set, r > crit, and therefore these vehicle weights fit the normal model.
★The “classic TI83” (non“Plus” model) doesn’t compute the critical value, so you have to do it yourself. See the formula in item 4 in the next section.
Example 13: Here’s a random sample of the lengths (in seconds) of tunes in my iTunes library:
120 219 242 134 129 105 275 76 412 268 486 199 651 291 126 210 151 98 100 92 305 231 734 468 410 313 644 117 451 375
Do they fit a normal model?
Solution: I entered them in a statistics list and then ran MATH200A Program part 4. The result was the plot at the right.
You can see that the plot is curved. This is reinforced by comparing r=0.9473 to crit=0.9639. r < crit. The points diverge too far from a straight line, and therefore I cannot use a normal model for the lengths of my iTunes songs.
The basic idea isn’t too bad. You make an xy scatterplot where the x’s are the data points, sorted in ascending order, and the y’s are the expected zscores for a normal distribution.
Why would you expect that to be a straight line? Recall the formula for a zscore: z = (x−x̅)/s. Breaking the one fraction into two, you have z = x/s−x̅/s. That’s just a linear equation, with slope 1/s and intercept x̅/s. So an xz plot of any theoretical ND, plotting each data point’s zscore against the actual data value, would be a straight line.
Further, if your actual data points are ND, then their actual zscores will match their expectedforanormaldistribution zscores, and therefore a scatterplot of expected zscores against actual data values will also be a straight line.
Now, in real life no data set is ever exactly a ND, so you won’t ever see a perfectly straight line. Instead, you say that the closer the points are to a straight line, the closer the data set is to normal. If the data points are too far from a straight line — if their correlation coefficient r is lower than some critical value — then you reject the idea that the data set is ND.
Okay, so you have to plot the data points against what their zscores should be if this is a ND, and specifically for a sample of n points from a ND, where n is your sample size. This must be built up in a sequence of steps:
invNorm
of (i−.375)/(n+.25).1.0071 − 0.1371/√n − 0.3682/n + 0.7780/n² at α=0.10
0.9963 − 0.0211/√n − 1.4106/n + 3.1791/n² at α=0.01
The closer the points are to a straight line, the closer the data set is to fitting a normal model. In other words, a larger r indicates a ND, and a smaller r indicates a nonND. You can draw one of two conclusions:
(If you haven’t studied hypothesis testing yet, another way to say it is that you’re pretty sure the data set doesn’t fit the normal model because there’s less than a 5% probability that it does.)
This doesn’t mean you are certain it does, merely that you can’t rule it out. Technically you don’t know either way, but practically it doesn’t matter. Remember (or you will learn later) that inferential statistics procedures like t tests are robust, meaning that they still work even if the data are moderately nonnormal. But if your data were extremely nonnormal, r would be less than the critical value. When r is greater than the critical value, you don’t know whether the data set comes from normal data or moderately nonnormal data, but either way your inferential statistics procedures are okay.
So the bottom line is, if r > CRIT, treat the data as normal, and if r < CRIT, don’t.
The normal probability plot is just one of many possible ways to determine whether a data set fits the normal model. Another method, the D’AgostinoPearson test, uses numerical measures of the shape of a data set called skewness and kurtosis to test for normality. For details, see Assessing Normality in Measures of Shape: Skewness and Kurtosis.
(The online book has live links to all of these.)
normalcdf
.invNorm
.
That function needs area to left, so if the
problem gives area to right you have to use 1 minus that area.invNorm(1−
area)
.Chapter 8 WHYL → ← Chapter 6 WHYL
Write out your solutions to these exercises, making a sketch and showing your work for all computations. Then check your solutions against the solutions page and get help with anything you don’t understand.
Caution! If you don’t see how to start a problem, don’t peek at the solution — you won’t learn anything that way. Ask your instructor or a tutor for a hint. Or just leave it and go on to a different problem for now. You may find when you return to that “impossible” problem that you see how to do it after all.
You’ll need this information for several of the problems:
Source: “Is Human Height Bimodal?” (Schilling 2002 [see “Sources Used” at end of book]).
78 66 98 90 74 70 70 76 72 86 62 84 66 70 68
0.3 8.8 11.5 12 12.3 12.5 13 13.5 14.8
Updates and new info: https://BrownMath.com/swt/