Inferences about Linear Regression
Copyright © 2002–2023 by Stan Brown, BrownMath.com
Copyright © 2002–2023 by Stan Brown, BrownMath.com
Summary: When you do a linear regression, you get an equation in the form ŷ = b0 + b1x. This page shows how to estimate or test the slope of the regression line, and also how to predict the response value for a particular x.
Advice: This section is rather heavy going. While it’s nice to understand the background, you don’t actually need it to do the calculations. Especially on a first reading, you might want to skip down to the example.
Earlier in your course, you learned to find the least-squares regression line that best fits a set of points. (If you need a refresher, see Linked Variables.) That can be done by hand with formulas, or with much saving of labor by a TI calculator. The line of best fit for a sample has a slope and a y intercept, and it can be written in the form ŷ = ax + b>, ŷ = b0 + b1x, or similar.
Because the correlation is almost never exactly ±1, your data points don’t fall exactly on the regression line. For a particular data point (xj, yj) the difference between the prediction and the actual y value is called the residual: ej = yj−ŷj. Another way to look at it is that the actual y value involves both the prediction from the regression line, and the residual ej that is the discrepancy between the regression line and the actual data point. Symbolically, yj = b0 + b1xj + ej.
(Don’t let the notation confuse you. A number subscript refers to properties of the whole sample, and a letter subscript refers to properties of a particular point. Here the point (xj, yj) includes the residual ej, and you know that the residuals of different points will be different. The slope b0 and intercept b1 are properties of the whole sample because they describe the regression line that was derived from the whole sample.)
Because it’s quite common to have multiple points in your sample with the same xj and different yj’s, a given xj can have more than one residual ej.
Naturally, if you take another random sample you expect to get a slightly different regression line. In other words, the slope b1 and intercept b0 are sample statistics.
Just as the sample mean x̅ is a point estimate of the population mean μ, the slope and intercept you get by regression on a sample are point estimates for the true slope β1 and intercept β0 of the line that best fits the whole population:
(1) ŷ = β0 + β1x
As usual, Greek letters stand for population parameters. The Greek letter β (beta) corresponds to the Roman letter b.
Just as with the sample, the true linear correlation in the population almost certainly isn’t ±1. There is always some variability due to other factors, so the true regression line doesn’t describe the population perfectly. If you randomly select a particular xj and measure the y for that value of x, it’s not going to fall exactly on the line given by equation 1. There will be a residual ε (epsilon), so that the measured value yj will be
(2) yj = β0 + β1xj + εj
This is the least squares regression model. As with the sample, the number subscripts refer to properties of the whole population and the letter subscripts go with particular points. Just to be clear, the population is all the (x,y) pairs, both the ones you measured and the ones you didn’t. And you can have more than one εj for a given xj, because the population may contain multiple data pairs with the same x and the same or different y’s.
What can we say about the population parameters β0 and β1 that determine the regression line, equation 1? This is the standard practice of inferential statistics: use the sample to make predictions or decisions about the population. As always, begin by identifying a test statistic and finding its sampling distribution. That’s the only hard part, really.
To do our inferential procedures, we require that the y values are normally distributed around the regression line. That’s not the same as saying that the y values are normally distributed! Rather, it means that in the population, for any particular x you can measure many y’s, and the y’s for any particular x are normally distributed.
In other words, the residuals must be normally distributed. (Perfect normality isn’t required. As you’ll see later, the test statistic is Student’s t, which is robust against moderate departures from normality.)
You can check that requirement on your TI calculator by following these steps:
LRESID
. (To get
LRESID
, press [2nd
STAT
makes LIST
], scroll up to
RESID
, and press [ENTER
].)In addition to the requirement of normality, the plot of residuals versus x should be boring: no bends, no thickening or thinning from left to right, and no outliers.
Of course we don’t know all the points in the population, since our sample measured some of them and not all. As usual, our sample statistics are point estimates for the population parameters, which we don’t know.
So the standard deviation of the residuals, for all points in the population, is estimated by the standard deviation of the residuals, for just the points in our sample:
(3)
You’re used to the formula for standard deviation of a sample having n−1 in the denominator, so why is it n−2 here? In the standard deviation of a list of numbers, there are n−1 degrees of freedom because if you know n−1 points the last one is forced to make the mean come out right. Here, there is a different mean y for each value of x, so you have n−2 degrees of freedom.
For computational methods, please see the Example below.
The standard deviation of residuals se is used to compute confidence intervals for mean response as well as prediction intervals for individual responses. It is also a component of the formula for the standard error of the slope of the regression line, which is written sb1:
(4)
There’s no need for a √n term in the denominator — that’s already covered by equation 3’s computation of se, which forms part of the computation of sb1.
The standard error of the slope of the regression line is used to compute a confidence interval or perform a hypothesis test about the slope of the regression line. Notice that equation 4 computes sb1 not σb1 — this signifies that we have an estimate of the standard error, since the population standard deviation is unknown.
At long last, after all those preliminaries, we can identify the relevant distribution. The regression on any sample of size n will give you a slope b1 that depends on the sample. Slopes of samples form a Student’s t distribution about the slope β1 of a population regression:
(5)
where sb1 is calculated from equation 4 and equation 3.
For all the procedures on this page, we’ll use the following sample of commuting distances and times for fifteen randomly selected co-workers, adapted from Johnson & Kuby 2004 [full citation at https://BrownMath.com/swt/sources.htm#so_Johnson2004], page 623.
Commuting Distances and Times | |||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Person | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 |
Miles, x | 3 | 5 | 7 | 8 | 10 | 11 | 12 | 12 | 13 | 15 | 15 | 16 | 18 | 19 | 20 |
Minutes, y | 7 | 20 | 20 | 15 | 25 | 17 | 20 | 35 | 26 | 25 | 35 | 32 | 44 | 37 | 45 |
All the procedures on this page require you to perform the regression first, to obtain b0 and b1 for your sample. (They’re shown below as b0 = 3.6 and b1 = 1.9.)
Always think about the real-world meanings of these quantities. In this case, the slope b1 is the marginal cost of living further away: it tells you that adding a mile to the commute distance adds, on average, 1.9 minutes to the time. Mathematically, the intercept b0 is the commuting time for a distance of zero. Put that way it’s not meaningful, but you can also think of it as the fixed cost of commuting: 3.6 minutes of the average commute time doesn’t depend on distance. That covers starting the car, brushing off the snow, getting out of the driveway, finding a parking space in the company lot, and so forth.
The TI-83/84 or TI-89 saves you a lot of work, so this page will use it. If you don’t have one of those calculators, you can do the computations by hand, looking things up in tables where necessary, then check your results against the ones shown here.
TI Calculator Procedure | Hand Calculations |
---|---|
Put the x’s in list L3 and the
y’s in list L4. Then [STAT ] [◄ ] [▲ ]
and scroll to LinRegTTest. Specify L3 and L4, with frequency of 1.
The next line doesn’t matter, and you can either put Y1 in
ReqEQ [VARS ] [► ] [1 ] [1 ] or leave it blank.
Here are the setup screen and the results:
The calculator uses a+bx as the regression equation, so the intercept b0 shows as a=3.6434 and the slope b1 shows as b=1.8932. s=5.4011 on the results screen is se, the standard deviation of residuals, which saves a lot of computation. |
Begin by computing the sums that are needed in the formulas. You
should get these results:
n = 15, ∑x = 184, ∑x² = 2616 ∑y = 403, ∑y² = 12493, ∑xy = 5623 Now perform the regression. (Use Excel, or find formulas in Least Squares — the Gory Details.) You should get these results: intercept b0 = 3.643387816 slope b1 = 1.89320208 As a matter of interest, here’s a plot of the points and the regression line. The dots occur every two miles horizontally and every five minutes vertically.
Use equation 3 to compute the standard deviation of residuals in the sample:
You should get a result of se = 5.401135301. |
Now use equation 4 (at right) to find sb1, the standard
error of the slope. The LinRegTTest operation stored all sorts of
useful variables in the submenus of [VARS ] [5:Statistics ]:
n is the first variable under XY , summations are under
∑ , b0 is a and b1 is b under EQ ,
and se is s, down at position 0 under TEST .
|
Now use equation 4 to find sb1, the standard
error of the slope:
Compute SS(x) = 358.9333333. Divide se = 5.401135301 by √SS(x) to get sb1 = 0.2850875. (The “Sample Statistics” block of the accompanying Excel workbook will do these calculations for you.) |
Since we’ll need some values for our later computations, let’s make them easy to refer to:
b0 = 3.6434 b1 = 1.89320 SS(x) = 358.9333333 se = 5.4011 sb1 = 0.2850875
Before you can make any inference (hypothesis test or confidence interval) about correlation or regression in the population, check these requirements:
The problem tells you that the sample was random, but you have to do some work to verify the other two requirements.
TI Calculator Procedure | Hand Calculations |
---|---|
A nice byproduct of the LinRegTTest is that it computes the
residuals for you and stores them in a list called RESID .
![]() 2nd Y= makes STAT PLOT ] to plot them against x.
At the prompt for the y list, press [2nd STAT makes LIST ]
[▲ ], scroll to RESID if necessary, and
press [ENTER ]. Here’s the plot of the residuals against
x. As you can see, there are no bends, no thickening or thinning
trend, and no outliers.
|
Begin by computing all the residuals. For each (xj, yj) data
pair, the residual is ej = yj−ŷj =
yj−b0−b1xj.
Once you’ve computed the residuals, making a scatter plot of the residuals versus x is tedious but not especially difficult. You should get something like the plot shown at left. |
Next, check that the residuals are normally distributed.
![]() 2nd STAT makes LIST ] [▲ ], scroll to
RESID if necessary, and press [ENTER ] twice. If you
don’t have the program, you can get the same effect by choosing
the sixth graph type on the STAT PLOT screen and selecting the
RESID list as your data list.
Caution: the correlation coefficient r=.9772 in the picture
is for the normal probability plot, not the sample data. The
correlation coefficient of the commuting distances and times is
0.8788, as shown in the TI output from the | To test that the residuals are normal, you can use an Excel workbook; see Normality Check and Finding Outliers in Excel. Some models of TI calculators can make normal probability plots; for instructions see Normality Check on TI-83/84 or Normality Check on TI-89. If all else fails, you can make the plot by hand; see the “Theory” appendix to any of those articles. |
Now we’re ready to compute a confidence interval. The slope b1 for our sample is a point estimate for the true regression slope β1 of the population, so we can estimate β1 for any desired confidence level.
On the TI-89 and TI-84, you can use the
LinRegTInt
command on the STAT TESTS
menu. This
gives (1.2773, 2.5091) for the 95% confidence interval, and
you can jump right to the Conclusion
below.
(The “Confidence Interval for Slope” block of the accompanying Excel workbook will do these calculations for you, and so will MATH200B part 7.)
With the TI-83, you have to do the computations by hand. From equation 5 we see that the sampling distribution of b1 follows a Student’s t distribution with mean β1, standard deviation sb1, and degrees of freedom n−2. That’s all we need to see that the (1−α)% confidence interval for the slope of the regression line is
(6) b1 − E ≤ β1 ≤ b1 + E where E = tn−2,α/2 · sb1
Example: Compute the 95% confidence interval for the slope of the line in our example.
Solution: We already have b1 = 1.89320 and sb1 = 0.2850875. To compute tn−2,α/2, start with n = 15 so n−2 = 13. 1−α = 0.95, so α = 0.05 and α/2 = 0.025. That means that
E = t13,0.025 · 0.2850875
There are many methods to find
t13,0.025 — including the invT
function on the TI-84, and good old reference tables in a book.
The value is 2.160368652, and therefore
E = 2.160368652 · 0.2850875 = 0.6158940982
1.89320 − 0.6158940982 ≤ β1 ≤ 1.89320 + 0.6158940982
1.28 ≤ β1 ≤ 2.51
Conclusion: We’re 95% confident that the slope is between 1.28 and 2.51 minutes per mile, or that each extra mile of commute costs 1.28 to 2.51 extra minutes.
You can test whether the slope is positive (H1:β1>0), whether it’s negative (H1:β1<0), or whether it’s nonzero (H1:β1≠0). Since the population correlation coefficient ρ has the same sign as the slope β1, this is equivalent to testing whether there is a positive linear relation, a negative linear relation, or just any linear relation between x and y.
In all three cases, the null is that the slope is zero (H0:β1=0) or that there’s no linear association in the population (H0:ρ=0).
Example: Test the hypothesis that commute time is associated with commute distance. Use α = 0.05.
Comment: This is a two-tailed test to determine if the slope is nonzero. You might think it’s pretty obvious that the slope can’t be negative. That would mean that the further you have to drive, the less time it tends to take. But maybe people who live further away take freeways, while people who live closer must take congested local streets. While it’s not likely, it’s at least possible.
With a TI-83/84 or a TI-89, use LinRegTTest
. The
test statistic is t0 = 6.64, and the two-tailed p-value
is 0.000016. You reject H0 and accept H1: the slope is
nonzero. Please see
Conclusion below for the
interpretation.
(The “Hypothesis Test for Slope” block of the accompanying Excel workbook will do these calculations for you.)
If you’re working by hand, use equation 5, reproduced at right.
β1=0 according to the null hypothesis, so your test statistic
is
t0 = (b1 − 0) / sb1
t0 = 1.89320 / 0.2850875 = 6.64
Now use a table, for n−2 = 13 degrees of freedom, to find the area of the right-hand tail as 8.054×10-6. This is a two-tailed test, so the p-value is twice that, 1.6×10-5 or about 0.000016. This is well under α, so you reject H0 and accept H1. The slope is nonzero.
Conclusion: At the 0.05 level of significance, there is a linear association between commuting distance in miles and commuting time in minutes, and the slope is positive: a longer distance does tend to take more time.
See also: p < α in Two-Tailed Test: What Does It Tell You? for one-tailed interpretation of a two-tailed test.
The y intercept β0 is simply the number you get when you set x = 0 in the regression equation. Please see below for the confidence interval procedure for mean response to any x.
MATH200B part 7 will compute a confidence interval for the y intercept.
The purpose of a regression line is to predict the response of the dependent variable y to the independent variable x. The regression line derived from the sample lets us plug in xj and compute
ŷj = b0 + b1 xj
In the population, for any given xj there can be many yj’s. ŷj above is a point estimate for the mean value of yj in the population for the given value of xj. But if we had a different sample, we’d have a different ŷj. So we’d like to know the mean value of y in the population for a particular value of x. For a particular xj, we denote the mean value of y as μy|x=xj.
Like any other population mean value, this has a confidence interval formed by taking a margin of error around the point estimate ŷj:
(7)
where se is the standard deviation of the residuals from equation 3.
(The “Confidence Interval for Mean Response” block of the accompanying Excel workbook will do these calculations for you, and so will MATH200B part 7.)
Example: Construct the 95% confidence interval for the mean commute time for a 10-mile one-way trip.
Solution: Compute the point estimate ŷj and the margin of error E, and combine them to make the confidence interval about μy|x=xj.
To compute ŷj, see How to Find ŷ from a Regression on TI-83/84. Here xj = 10 and ŷj = 3.6434 + 1.89320·10 = 22.5754 minutes. This is the point estimate of the average time for a 10-mile commute, based on this sample alone.
To compute E, we need some figures that we computed above:
t13,0.025 = 2.160368652
se = 5.401135301
SS(x) = 358.9333333
The last thing we need for equation 7 is x̅, the average value of x in our sample. That’s just
x̅ = (∑x)/n = 184/15
Now put together the margin of error:
E = 2.160368652 · 5.401135301 · √[1/15 + (10−184/15)²/358.9333333]
E = 3.320501208
and using ŷj = 22.5754 from a couple of paragraphs ago,
22.5754 − 3.3205 ≤ μy|x=xj ≤ 22.5754 + 3.3205
19.25 ≤ μy|x=xj ≤ 25.90
Conclusion: We’re 95% confident that the average time for all 10-mile commutes is between 19.25 and 25.90 minutes.
Maybe you’re not so concerned about the mean commute time for 10 miles. Maybe you want to predict a 95% interval for your time. In other words, you don’t want the population parameter μy|x=xj that we calculated above, you want the likely range for what your own commute time might be.
We call this a prediction interval for individual responses, to distinguish from the confidence interval for mean response. As usual, there is more variation in individual responses than there is in means, so the prediction interval is wider than the confidence interval. But the formula is strikingly similar to the equation 7 formula for the confidence interval of the mean:
(8)
The only difference is that the standard error is larger by the addition of 1 inside the radical sign.
(The “Confidence Interval for Individual Response” block of the accompanying Excel workbook will do these calculations for you, and so will MATH200B part 7.)
Example: Let’s compute the 95% prediction interval for y(10), the y values at x = 10. We already have all the numbers we need from the previous example:
ŷj = 22.5754
t13,0.025 = 2.160368652
se = 5.401135301
SS(x) = 358.9333333
x̅ = (∑x)/n = 184/15
Now put together the margin of error:
E = 2.160368652 · 5.401135301 · √[1 + 1/15 + (10−184/15)²/358.9333333]
E = 12.13170637
and the prediction interval:
22.5754 − 12.1317 ≤ yj ≤ 22.5754 + 12.1317
10.44 ≤ yj ≤ 34.71
Conclusion: You are 95% likely to have a commute time between 10.44 and 34.71 minutes for a 10-mile commute.
The prediction interval for an individual trip of a particular distance is much wider than the confidence interval for the mean of all trips of that distance. This falls right in line with what we already know, that there is a lot less variability in means than in individual data points.
Updates and new info: https://BrownMath.com/stat/