BrownMath.com → TI-83/84/89 → Normality Check TI-89
Updated 19 Dec 2016 (What’s New?)

Normality Check on TI-89

Copyright © 2012–2017 by Stan Brown

Summary: In inferential statistics, you need to know whether small data sets are normally distributed. Actually, no real-life data set is exactly normal, but you can use your TI-89 to test whether a data set is close enough to normally distributed. The main tool for this is a normal probability plot. The closer the data set is to normal, the closer that plot will be to a straight line.

But just looking at a plot, you may not be sure whether it’s “close enough” to a straight line, especially with smaller data sets. Most of the time, you need to make some fairly gnarly computations to answer that question.

This note shows you how to make the plot and do the computations for an example, and then in an appendix it gives the theory behind all of this.

Alternatives:
Contents:

Example — Vehicle Weights

Consider these vehicle weights (in pounds):

2950 4000 3300 3350 3500 3550 3500 2900 3250 3350

Construct a plot to decide whether these vehicle weights seem to be normally distributed.

The Procedure

Step 1: Enter the numbers in list1 and sort them.

Enter the data points. data points entered in list1 [] [APPS] and select the Stats/List Editor.
 
Cursor onto the label list1 at top of first column, then [CLEAR] [ENTER] erases the list.
 
Enter the data values.
Sort the data, unless they’re already in ascending order. Sort List dialog [F3] [2] [1] gets you the List Ops menu, specifically the Sort List command. In the dialog box, use the [alpha] key to type list1: [ALPHA 4 makes l] [ALPHA 9 makes i] [ALPHA 3 makes s], then plain [T] [1]. The sort order should already be ascending, but change it if necessary.
sorted data list

Step 2: Compute the normal z scores.

The test for normality is a normal probability plot. It’s a scatterplot, where x = the actual data and y = the z scores the data points would have if the data were perfectly normally distributed. The closer the plot is to a straight line, the closer the data set is to a normal distribution.

The calculator has a command to create a normal probability plot. But it’s really hard to look at a plot and tell whether it’s close enough to a straight line. You need a numerical test to help you in doubtful cases — and most cases are doubtful. So you end up having to duplicate some of the work the calculator would do.

The next calculations could all be combined into one very long command storing into list2, but it would be really easy to make mistakes in such a long formula. Instead, I’ve broken the calculation into two chunks.

Get the normal probabilities into list2. (The normal probabilities are the probabilities of getting each data point or a lower one by random selection, if the data points are normally distributed. The appendix gives the formula (i−0.375)/(n+0.25), where i is the numbers 1, 2, 3, … and n is the sample size. storing normal probabilities in list2; see text Cursor to the column heading list2 (not the first row under the heading). You’re going to enter a formula that fills the list:
seq((x−.375)/10.25,x,1,10)
(You can see a part of that formula in the illustration at right.) The seq function has four arguments, and basically it says to evaluate the expression for values of x from 1 to 10.
 
Here’s some help with entering the formula.
  • Use the [alpha] key to type seq: [ALPHA 3 makes s] [ALPHA ÷ makes e] [ALPHA 1 makes q]. Note the double paren after seq.
  • There are two 10s in the formula. Those are 10 this time because you have 10 data points, but when you have a different number of data points then you should use that number in both places.
  • normal probabilities in list2 Press [ENTER] at the end of the formula, and list2 will be filled with the normal probabilities. (Anything that was previously in list2 is erased for you.)
Find z scores that correspond to those probabilities. These are the z scores that your data would have if they were normally distributed, and they’ll also be used in the normal probability plot below. Cursor to the column heading list3. The formula you’ll be entering in list3 is another seq expression:
seq(TIStat.invNorm(list2[x]),x,1,10)
Here are some hints to help you:
  • finding invNorm in flash apps catalog Use the [alpha] key, as before, to type seq This time you have only one paren.
  • To find the invNorm function, press [CATALOG] [F3] to get to the catalog for flash apps including stats. The calculator has already “pressed“ [alpha] for you, so just press [9] to move to the i’s in the catalog list. Scroll down to invNorm and press [ENTER].
  • To get list2, press [F3] [1] for list name, select list2, and press [ENTER].
  • Notice that the x after the list name is enclosed in square brackets, with a close paren following the close bracket.
  • As before, I use 10 here because the example has 10 data points. You should use the actual number of data points in each problem.

Here’s the last part of the formula, and the resulting the z scores in list3.
last part of formula for list3     z scores in list3

Step 3: Clear other plots.

In this step, you disable any other plots and graphs that could overlay your normal probability plot.

Press [F2] [3] to turn off all statistics plots, and [F2] [4] to turn off all function plots. (The plots aren’t deleted, just made inactive.)

Step 4: Display the normal probability plot.

In this step, you make an xy scatterplot, where the x’s are the original data points and the y’s are the z scores you just computed. Press [F2] [1] to get to the Plot Setup screen. Cursor to an empty line, or to a plot that you don’t care about keeping, and press [F1] to begin defining it.
 
plot setup screen The plot setup dialog is shown at right. Select plot type “scatter” and set x to list1 and y to list3. You can’t use a command to paste the list names here, so use the [alpha] key to type them.
 
Press [ENTER] to close the dialog box.
 
normal probability plot Press [F5], Zoom Data, to show the plot and scale it to take up the full screen. Here’s what it looks like:
Because this article helps you,
please click to donate!
Because this article helps you,
please donate at
BrownMath.com/donate.

If this plot is close to a straight line, then the data set is close to a normal distribution. But how close is close enough? If you’re very lucky, the plot will be an obvious straight line, or it will be very far from a straight line. Then you can declare the data normal or not normal, and stop. But usually the plot is iffy, and you could go either way just from looking at it.

What about the plot for this example? There are certainly some bumps, but is the plot too far from a straight line? It’s hard to say. This is one of those iffy cases that you usually get.

Fortunately, there’s a numerical test that helps you make a decision. The correlation coefficient answers the question “how close are these points to a straight line”, so let’s compute it.

Step 5: Compute r.

To find the correlation coefficient, the easy way is to do a linear regression. selecting LinReg(ax+b) The graph took you out of Stats/List Editor, so press [] [APPS] to get back.
 
Press [F4] [3] to bring up the regression menu, then [2] to select LinReg(ax+b).
regression dialog Here’s the regression dialog. The X list is list1, and the Y list is list3; use the [alpha] key to type them. Freq should already be 1, but set it if it’s not.
regression results: r=.9599 Press [ENTER], and the results appear. You don’t care about the other numbers, but write down r=.9599 while it’s still on your screen.

Step 6: Compute CRIT and compare it to r.

The correlation coefficient of .9599 (about .96) seems pretty good, but is it good enough? To answer this question, you have to compare it to a critical value. If r < CRIT, the data set is too far from normally distributed. If r > CRIT, the data set is close enough to normally distributed.

The formula for CRIT is 1.0063−.6118/n+1.3505/n²−.1288/√n, where n is the sample size. computing CRIT; see text Press [HOME], then enter that formula in your calculator. But don’t type n — use your actual number of data points. In this example, that’s 10.

Whew! r = 0.9599 and CRIT ≈ 0.9179. r > CRIT, and therefore you can say that the data set is close enough to a normal distribution.

Appendix — The Theory

The basic idea isn’t too bad. You make an xy scatterplot where the x’s are the data points, sorted in ascending order, and the y’s are the expected z scores for a normal distribution. (I’m going to abbreviate “normally distributed” or “normal distribution” as ND to save wear and tear on my keyboard and your eyes.)

Why would you expect that to be a straight line? Recall the formula for a z score: z = (x−)/s. Breaking the one fraction into two, you have z = x/s−/s. That’s just a linear equation, with slope 1/s and intercept /s. So an xz plot of any theoretical ND, plotting each data point’s z score against the actual data value, would be a straight line.

Further, if your actual data points are ND, then their actual z scores will match their expected-for-a-normal-distribution z scores, and therefore a scatterplot of expected z scores against actual data values will also be a straight line.

Now, in real life no data set is ever exactly a ND, so you won’t ever see a perfectly straight line. Instead, you say that the closer the points are to a straight line, the closer the data set is to normal. If the data points are too far from a straight line — if their correlation coefficient r is lower than some critical value — then you reject the idea that the data set is ND.

Okay, so you have to plot the data points against what their z-scores should be if this is a ND, and specifically for a sample of n points from a ND, where n is your sample size. This must be built up in a sequence of steps:

  1. Divide the normal curve (mentally) into n regions of equal probability and take one probability from each region. For technical reasons, the probability number you use for region i is (i−.375)/(n+.25). This formula is in many textbooks, and also in Ryan and Joiner’s paper Normal Probability Plots and Tests for Normality [full citation at http://BrownMath.com/swt/sources.htm#so_Ryan1976].
  2. Compute the expected z scores for those probabilities. Working with the calculator, that’s just invNorm of (i−.375)/(n+.25).
  3. Plot those expected z scores against the data values. This xy plot (or xz plot) has a correlation coefficient r, computed just like any other correlation coefficient.
  4. Compare the r for your data set to the critical value for the size of your data set. Ryan and Joiner determined that the critical value for sample size n, at the 0.05 significance level,, is 1.0063−.1288/√n−.6118/n+1.3505/n². To make it a little easier on the calculator I rearranged it as 1.0063−.6118/n+1.3505/n²−.1288/√n.
    In the same paper, they gave formulas for critical values at other significance levels:

    1.0071−0.1371/√n−0.3682/n+0.7780/n² at α=0.10

    0.9963−0.0211/√n−1.4106/n+3.1791/n² at α=0.01

The closer the points are to a straight line, the closer the data set is to fitting a normal model. In other words, a larger r indicates a ND, and a smaller r indicates a non-ND. You can draw one of two conclusions:

So the bottom line is, if r > CRIT, treat the data as normal, and if r < CRIT, don’t.

The normal probability plot is just one of many possible ways to determine whether a data set fits the normal model. Another method, the D’Agostino-Pearson test, uses numerical measures of the shape of a data set called skewness and kurtosis to test for normality. For details, see Assessing Normality in Measures of Shape: Skewness and Kurtosis.

What’s New
Because this article helps you,
please click to donate!
Because this article helps you,
please donate at
BrownMath.com/donate.

Updates and new info: http://BrownMath.com/ti83/

Site Map | Home Page | Contact