BrownMath.com → Statistics → Inferences about Regression
Updated 3 Jan 2016 (What’s New?)

Inferences about Linear Regression

Copyright © 2002–2023 by Stan Brown, BrownMath.com

Summary: When you do a linear regression, you get an equation in the form ŷ = b0 + b1x. This page shows how to estimate or test the slope of the regression line, and also how to predict the response value for a particular x.

See also:
Contents:

Sampling Distribution

Advice: This section is rather heavy going. While it’s nice to understand the background, you don’t actually need it to do the calculations. Especially on a first reading, you might want to skip down to the example.

Review: Regression on a Sample

Earlier in your course, you learned to find the least-squares regression line that best fits a set of points. (If you need a refresher, see Linked Variables.) That can be done by hand with formulas, or with much saving of labor by a TI calculator. The line of best fit for a sample has a slope and a y intercept, and it can be written in the form ŷ = ax + b>, ŷ = b0 + b1x, or similar.

Because the correlation is almost never exactly ±1, your data points don’t fall exactly on the regression line. For a particular data point (xjyj) the difference between the prediction and the actual y value is called the residual: ej = yjŷj. Another way to look at it is that the actual y value involves both the prediction from the regression line, and the residual ej that is the discrepancy between the regression line and the actual data point. Symbolically, yj = b0 + b1xj + ej.

(Don’t let the notation confuse you. A number subscript refers to properties of the whole sample, and a letter subscript refers to properties of a particular point. Here the point (xjyj) includes the residual ej, and you know that the residuals of different points will be different. The slope b0 and intercept b1 are properties of the whole sample because they describe the regression line that was derived from the whole sample.)

Because it’s quite common to have multiple points in your sample with the same xj and different yj’s, a given xj can have more than one residual ej.

Naturally, if you take another random sample you expect to get a slightly different regression line. In other words, the slope b1 and intercept b0 are sample statistics.

Regression on a Population

Just as the sample mean is a point estimate of the population mean μ, the slope and intercept you get by regression on a sample are point estimates for the true slope β1 and intercept β0 of the line that best fits the whole population:

(1) ŷ = β0 + β1x

As usual, Greek letters stand for population parameters. The Greek letter β (beta) corresponds to the Roman letter b.

Just as with the sample, the true linear correlation in the population almost certainly isn’t ±1. There is always some variability due to other factors, so the true regression line doesn’t describe the population perfectly. If you randomly select a particular xj and measure the y for that value of x, it’s not going to fall exactly on the line given by equation 1. There will be a residual ε (epsilon), so that the measured value yj will be

(2) yj = β0 + β1xj + εj

This is the least squares regression model. As with the sample, the number subscripts refer to properties of the whole population and the letter subscripts go with particular points. Just to be clear, the population is all the (x,y) pairs, both the ones you measured and the ones you didn’t. And you can have more than one εj for a given xj, because the population may contain multiple data pairs with the same x and the same or different y’s.

What can we say about the population parameters β0 and β1 that determine the regression line, equation 1? This is the standard practice of inferential statistics: use the sample to make predictions or decisions about the population. As always, begin by identifying a test statistic and finding its sampling distribution. That’s the only hard part, really.

Requirements

To do our inferential procedures, we require that the y values are normally distributed around the regression line. That’s not the same as saying that the y values are normally distributed! Rather, it means that in the population, for any particular x you can measure many y’s, and the y’s for any particular x are normally distributed.

In other words, the residuals must be normally distributed. (Perfect normality isn’t required. As you’ll see later, the test statistic is Student’s t, which is robust against moderate departures from normality.)

You can check that requirement on your TI calculator by following these steps:

  1. Run your regression; see Scatterplot, Correlation, and Regression on TI-83/84, Step 1 and Step 2 only.
  2. Follow the procedure in MATH200A Program part 4 and when prompted for a data list specify LRESID. (To get LRESID, press [2nd STAT makes LIST], scroll up to RESID, and press [ENTER].)

In addition to the requirement of normality, the plot of residuals versus x should be boring: no bends, no thickening or thinning from left to right, and no outliers.

Standard Errors

Of course we don’t know all the points in the population, since our sample measured some of them and not all. As usual, our sample statistics are point estimates for the population parameters, which we don’t know.

So the standard deviation of the residuals, for all points in the population, is estimated by the standard deviation of the residuals, for just the points in our sample:

(3) s sub e = root of sum of squares of residuals over n minus 2, or = root of fraction with top part y squared minus b sub 0 times sum of y, minus b sub 1 times sum of x y, and bottom part n minus 2

You’re used to the formula for standard deviation of a sample having n−1 in the denominator, so why is it n−2 here? In the standard deviation of a list of numbers, there are n−1 degrees of freedom because if you know n−1 points the last one is forced to make the mean come out right. Here, there is a different mean y for each value of x, so you have n−2 degrees of freedom.

For computational methods, please see the Example below.

The standard deviation of residuals se is used to compute confidence intervals for mean response as well as prediction intervals for individual responses. It is also a component of the formula for the standard error of the slope of the regression line, which is written sb1:

(4) s sub b1 = s sub e over SS(x), or = s sub e over square root of the quantity sigma x squared, minus reciprocal n times the square of sigma x

There’s no need for a √n term in the denominator — that’s already covered by equation 3’s computation of se, which forms part of the computation of sb1.

The standard error of the slope of the regression line is used to compute a confidence interval or perform a hypothesis test about the slope of the regression line. Notice that equation 4 computes sb1 not σb1 — this signifies that we have an estimate of the standard error, since the population standard deviation is unknown.

Sampling Distribution for the Slope

Because this article helps you,
please click to donate!
Because this article helps you,
please donate at
BrownMath.com/donate.

At long last, after all those preliminaries, we can identify the relevant distribution. The regression on any sample of size n will give you a slope b1 that depends on the sample. Slopes of samples form a Student’s t distribution about the slope β1 of a population regression:

(5) t = b sub 1 minus beta sub1, all over s sub b1

where sb1 is calculated from equation 4 and equation 3.

The Example

For all the procedures on this page, we’ll use the following sample of commuting distances and times for fifteen randomly selected co-workers, adapted from Johnson & Kuby 2004 [full citation at https://BrownMath.com/swt/sources.htm#so_Johnson2004], page 623.

Commuting Distances and Times
Person 123456789101112131415
Miles, x 35781011121213151516181920
Minutes, y 72020152517203526253532443745

All the procedures on this page require you to perform the regression first, to obtain b0 and b1 for your sample. (They’re shown below as b0 = 3.6 and b1 = 1.9.)

Always think about the real-world meanings of these quantities. In this case, the slope b1 is the marginal cost of living further away: it tells you that adding a mile to the commute distance adds, on average, 1.9 minutes to the time. Mathematically, the intercept b0 is the commuting time for a distance of zero. Put that way it’s not meaningful, but you can also think of it as the fixed cost of commuting: 3.6 minutes of the average commute time doesn’t depend on distance. That covers starting the car, brushing off the snow, getting out of the driveway, finding a parking space in the company lot, and so forth.

The TI-83/84 or TI-89 saves you a lot of work, so this page will use it. If you don’t have one of those calculators, you can do the computations by hand, looking things up in tables where necessary, then check your results against the ones shown here.

TI Calculator ProcedureHand Calculations
Put the x’s in list L3 and the y’s in list L4. Then [STAT] [] [] and scroll to LinRegTTest. Specify L3 and L4, with frequency of 1. The next line doesn’t matter, and you can either put Y1 in ReqEQ [VARS] [] [1] [1] or leave it blank. Here are the setup screen and the results:

LinRegTTest setup screen
LinRegTTest results screen

The calculator uses a+bx as the regression equation, so the intercept b0 shows as a=3.6434 and the slope b1 shows as b=1.8932. s=5.4011 on the results screen is se, the standard deviation of residuals, which saves a lot of computation.

Begin by computing the sums that are needed in the formulas. You should get these results:

n = 15, ∑x = 184, ∑x² = 2616

y = 403, ∑y² = 12493, ∑xy = 5623

Now perform the regression. (Use Excel, or find formulas in Least Squares — the Gory Details.) You should get these results:

intercept b0 = 3.643387816

slope b1 = 1.89320208

As a matter of interest, here’s a plot of the points and the regression line. The dots occur every two miles horizontally and every five minutes vertically.

scatterplot and regression line

Use equation 3 to compute the standard deviation of residuals in the sample:

duplicate of equation 3

You should get a result of se = 5.401135301.

Now use equation 4 (at right) to find sb1, the standard error of the slope. The LinRegTTest operation stored all sorts of useful variables in the submenus of [VARS] [5:Statistics]: n is the first variable under XY, summations are under , b0 is a and b1 is b under EQ, and se is s, down at position 0 under TEST.

computation of s sub b1; see text As shown here, compute SS(x) = 358.9333333 first, then divide se by the square root of that to get sb1 = 0.2850875.

Now use equation 4 to find sb1, the standard error of the slope:

duplicate of equation 4

Compute SS(x) = 358.9333333. Divide se = 5.401135301 by √SS(x) to get sb1 = 0.2850875.

(The “Sample Statistics” block of the accompanying Excel workbook will do these calculations for you.)

Since we’ll need some values for our later computations, let’s make them easy to refer to:

b0 = 3.6434     b1 = 1.89320     SS(x) = 358.9333333     se = 5.4011     sb1 = 0.2850875

Requirements Check

Before you can make any inference (hypothesis test or confidence interval) about correlation or regression in the population, check these requirements:

The problem tells you that the sample was random, but you have to do some work to verify the other two requirements.

TI Calculator ProcedureHand Calculations
A nice byproduct of the LinRegTTest is that it computes the residuals for you and stores them in a list called RESID. plot of residuals against commute distances You can easily use [2nd Y= makes STAT PLOT] to plot them against x. At the prompt for the y list, press [2nd STAT makes LIST] [], scroll to RESID if necessary, and press [ENTER]. Here’s the plot of the residuals against x. As you can see, there are no bends, no thickening or thinning trend, and no outliers. Begin by computing all the residuals. For each (xjyj) data pair, the residual is ej = yjŷj = yjb0b1xj.

Once you’ve computed the residuals, making a scatter plot of the residuals versus x is tedious but not especially difficult. You should get something like the plot shown at left.

Next, check that the residuals are normally distributed. normal probability plot of residuals An easy way is to use MATH200A part 4. When prompted for the data list, press [2nd STAT makes LIST] [], scroll to RESID if necessary, and press [ENTER] twice. If you don’t have the program, you can get the same effect by choosing the sixth graph type on the STAT PLOT screen and selecting the RESID list as your data list.

Caution: the correlation coefficient r=.9772 in the picture is for the normal probability plot, not the sample data. The correlation coefficient of the commuting distances and times is 0.8788, as shown in the TI output from the LinRegTTest, above.

To test that the residuals are normal, you can use an Excel workbook; see Normality Check and Finding Outliers in Excel. Some models of TI calculators can make normal probability plots; for instructions see Normality Check on TI-83/84 or Normality Check on TI-89. If all else fails, you can make the plot by hand; see the “Theory” appendix to any of those articles.

Confidence Interval for Slope of the Regression Line

Now we’re ready to compute a confidence interval. The slope b1 for our sample is a point estimate for the true regression slope β1 of the population, so we can estimate β1 for any desired confidence level.

On the TI-89 and TI-84, you can use the LinRegTInt command on the STAT TESTS menu. This gives (1.2773, 2.5091) for the 95% confidence interval, and you can jump right to the Conclusion below.

(The “Confidence Interval for Slope” block of the accompanying Excel workbook will do these calculations for you, and so will MATH200B part 7.)

With the TI-83, you have to do the computations by hand. From equation 5 we see that the sampling distribution of b1 follows a Student’s t distribution with mean β1, standard deviation sb1, and degrees of freedom n−2. That’s all we need to see that the (1−α)% confidence interval for the slope of the regression line is

(6) b1 − E  ≤  β1  ≤  b1 + E  where E = tn−2,α/2 · sb1

Example: Compute the 95% confidence interval for the slope of the line in our example.

Solution: We already have b1 = 1.89320 and sb1 = 0.2850875. To compute tn−2,α/2, start with n = 15 so n−2 = 13. 1−α = 0.95, so α = 0.05 and α/2 = 0.025. That means that

E = t13,0.025 · 0.2850875

There are many methods to find t13,0.025 — including the invT function on the TI-84, and good old reference tables in a book. The value is 2.160368652, and therefore

E = 2.160368652 · 0.2850875 = 0.6158940982

1.89320 − 0.6158940982 ≤ β1 ≤ 1.89320 + 0.6158940982

1.28 ≤ β1 ≤ 2.51

Conclusion: We’re 95% confident that the slope is between 1.28 and 2.51 minutes per mile, or that each extra mile of commute costs 1.28 to 2.51 extra minutes.

Hypothesis Test for Slope of the Regression Line

You can test whether the slope is positive (H11>0), whether it’s negative (H11<0), or whether it’s nonzero (H11≠0). Since the population correlation coefficient ρ has the same sign as the slope β1, this is equivalent to testing whether there is a positive linear relation, a negative linear relation, or just any linear relation between x and y.

In all three cases, the null is that the slope is zero (H01=0) or that there’s no linear association in the population (H0:ρ=0).

Example: Test the hypothesis that commute time is associated with commute distance. Use α = 0.05.

Comment: This is a two-tailed test to determine if the slope is nonzero. You might think it’s pretty obvious that the slope can’t be negative. That would mean that the further you have to drive, the less time it tends to take. But maybe people who live further away take freeways, while people who live closer must take congested local streets. While it’s not likely, it’s at least possible.

With a TI-83/84 or a TI-89, use LinRegTTest. The test statistic is t0 = 6.64, and the two-tailed p-value is 0.000016. You reject H0 and accept H1: the slope is nonzero. Please see Conclusion below for the interpretation.

(The “Hypothesis Test for Slope” block of the accompanying Excel workbook will do these calculations for you.)

t = b sub 1 minus beta sub1, all over s sub b1 If you’re working by hand, use equation 5, reproduced at right. β1=0 according to the null hypothesis, so your test statistic is

t0 = (b1 − 0) / sb1

t0 = 1.89320 / 0.2850875 = 6.64

Now use a table, for n−2 = 13 degrees of freedom, to find the area of the right-hand tail as 8.054×10-6. This is a two-tailed test, so the p-value is twice that, 1.6×10-5 or about 0.000016. This is well under α, so you reject H0 and accept H1. The slope is nonzero.

Conclusion: At the 0.05 level of significance, there is a linear association between commuting distance in miles and commuting time in minutes, and the slope is positive: a longer distance does tend to take more time.

See also: p < α in Two-Tailed Test: What Does It Tell You? for one-tailed interpretation of a two-tailed test.

Confidence Interval for y Intercept of Regression Line

The y intercept β0 is simply the number you get when you set x = 0 in the regression equation. Please see below for the confidence interval procedure for mean response to any x.

MATH200B part 7 will compute a confidence interval for the y intercept.

Confidence Interval for Mean Response to a Particular x

The purpose of a regression line is to predict the response of the dependent variable y to the independent variable x. The regression line derived from the sample lets us plug in xj and compute

ŷj = b0 + b1 xj

In the population, for any given xj there can be many yj’s. ŷj above is a point estimate for the mean value of yj in the population for the given value of xj. But if we had a different sample, we’d have a different ŷj. So we’d like to know the mean value of y in the population for a particular value of x. For a particular xj, we denote the mean value of y as μy|x=xj.

Like any other population mean value, this has a confidence interval formed by taking a margin of error around the point estimate ŷj:

(7) mu sub y for x=xj is between yhat minus E and yhat plus E where E = t sub n minus 2,alpha/2 times s sub e times the square root of reciprocal n plus fraction with the square of xj minus xbar on top and SS(x) on bottom

where se is the standard deviation of the residuals from equation 3.

(The “Confidence Interval for Mean Response” block of the accompanying Excel workbook will do these calculations for you, and so will MATH200B part 7.)

Example: Construct the 95% confidence interval for the mean commute time for a 10-mile one-way trip.

Solution: Compute the point estimate ŷj and the margin of error E, and combine them to make the confidence interval about μy|x=xj.

To compute ŷj, see How to Find ŷ from a Regression on TI-83/84. Here xj = 10 and ŷj = 3.6434 + 1.89320·10 = 22.5754 minutes. This is the point estimate of the average time for a 10-mile commute, based on this sample alone.

To compute E, we need some figures that we computed above:

t13,0.025 = 2.160368652

se = 5.401135301

SS(x) = 358.9333333

The last thing we need for equation 7 is , the average value of x in our sample. That’s just

= (∑x)/n = 184/15

Now put together the margin of error:

E = 2.160368652 · 5.401135301 · √[1/15 + (10−184/15)²/358.9333333]

E = 3.320501208

and using ŷj = 22.5754 from a couple of paragraphs ago,

22.5754 − 3.3205 ≤ μy|x=xj ≤ 22.5754 + 3.3205

19.25 ≤ μy|x=xj ≤ 25.90

Conclusion: We’re 95% confident that the average time for all 10-mile commutes is between 19.25 and 25.90 minutes.

Prediction Interval for Responses to a Particular x

Maybe you’re not so concerned about the mean commute time for 10 miles. Maybe you want to predict a 95% interval for your time. In other words, you don’t want the population parameter μy|x=xj that we calculated above, you want the likely range for what your own commute time might be.

We call this a prediction interval for individual responses, to distinguish from the confidence interval for mean response. As usual, there is more variation in individual responses than there is in means, so the prediction interval is wider than the confidence interval. But the formula is strikingly similar to the equation 7 formula for the confidence interval of the mean:

(8) yj for x=xj is between yhat minus E and yhat plus E where E = t sub n minus 2,alpha/2 times s sub e times the square root of 1 plus reciprocal n plus fraction with the square of xj minus xbar on top and SS(x) on bottom

The only difference is that the standard error is larger by the addition of 1 inside the radical sign.

(The “Confidence Interval for Individual Response” block of the accompanying Excel workbook will do these calculations for you, and so will MATH200B part 7.)

Example: Let’s compute the 95% prediction interval for y(10), the y values at x = 10. We already have all the numbers we need from the previous example:

ŷj = 22.5754

t13,0.025 = 2.160368652

se = 5.401135301

SS(x) = 358.9333333

= (∑x)/n = 184/15

Now put together the margin of error:

E = 2.160368652 · 5.401135301 · √[1 + 1/15 + (10−184/15)²/358.9333333]

E = 12.13170637

and the prediction interval:

22.5754 − 12.1317 ≤ yj ≤ 22.5754 + 12.1317

10.44 ≤ yj ≤ 34.71

Conclusion: You are 95% likely to have a commute time between 10.44 and 34.71 minutes for a 10-mile commute.

The prediction interval for an individual trip of a particular distance is much wider than the confidence interval for the mean of all trips of that distance. This falls right in line with what we already know, that there is a lot less variability in means than in individual data points.

What’s New?

Because this article helps you,
please click to donate!
Because this article helps you,
please donate at
BrownMath.com/donate.

Updates and new info: https://BrownMath.com/stat/

Site Map | Searches | Home Page | Contact