Inferences about Linear Regression
Copyright © 2002–2017 by Stan Brown
Copyright © 2002–2017 by Stan Brown
Summary: When you do a linear regression, you get an equation in the form ŷ = b_{0} + b_{1}x. This page shows how to estimate or test the slope of the regression line, and also how to predict the response value for a particular x.
Advice: This section is rather heavy going. While it’s nice to understand the background, you don’t actually need it to do the calculations. Especially on a first reading, you might want to skip down to the example.
Earlier in your course, you learned to find the leastsquares regression line that best fits a set of points. (If you need a refresher, see Linked Variables.) That can be done by hand with formulas, or with much saving of labor by a TI calculator. The line of best fit for a sample has a slope and a y intercept, and it can be written in the form ŷ = ax + b, ŷ = b_{0} + b_{1}x, or similar.
Because the correlation is almost never exactly ±1, your data points don’t fall exactly on the regression line. For a particular data point (x_{j}, y_{j}) the difference between the prediction and the actual y value is called the residual: e_{j} = y_{j}−ŷ_{j}. Another way to look at it is that the actual Y value involves both the prediction from the regression line, and the residual e_{j} that is the discrepancy between the regression line and the actual data point. Symbolically, y_{j} = b_{0} + b_{1}x_{j} + e_{j}.
(Don’t let the notation confuse you. A number subscript refers to properties of the whole sample, and a letter subscript refers to properties of a particular point. Here the point (x_{j}, y_{j}) includes the residual e_{j}, and you know that the residuals of different points will be different. The slope b_{0} and intercept b_{1} are properties of the whole sample because they describe the regression line that was derived from the whole sample.)
Because it’s quite common to have multiple points in your sample with the same x_{j} and different y_{j}’s, a given x_{j} can have more than one residual e_{j}.
Naturally, if you take another random sample you expect to get a slightly different regression line. In other words, the slope b_{1} and intercept b_{0} are sample statistics.
Just as the sample mean x̅ is a point estimate of the population mean μ, the slope and intercept you get by regression on a sample are point estimates for the true slope β_{1} and intercept β_{0} of the line that best fits the whole population:
(1) ŷ = β_{0} + β_{1}x
As usual, Greek letters stand for population parameters. The Greek letter β (beta) corresponds to the Roman letter b.
Just as with the sample, the true linear correlation in the population almost certainly isn’t ±1. There is always some variability due to other factors, so the true regression line doesn’t describe the population perfectly. If you randomly select a particular x_{j} and measure the Y for that value of X, it’s not going to fall exactly on the line given by equation 1. There will be a residual ε (epsilon), so that the measured value y_{j} will be
(2) y_{j} = β_{0} + β_{1}x_{j} + ε_{j}
This is the least squares regression model. As with the sample, the number subscripts refer to properties of the whole population and the letter subscripts go with particular points. Just to be clear, the population is all the (X,Y) pairs, both the ones you measured and the ones you didn’t. And you can have more than one ε_{j} for a given x_{j}, because the population may contain multiple data pairs with the same X and the same or different Y’s.
What can we say about the population parameters β_{0} and β_{1} that determine the regression line, equation 1? This is the standard practice of inferential statistics: use the sample to make predictions or decisions about the population. As always, begin by identifying a test statistic and finding its sampling distribution. That’s the only hard part, really.
To do our inferential procedures, we require that the Y values are normally distributed around the regression line. That’s not the same as saying that the Y values are normally distributed! Rather, it means that in the population, for any particular X you can measure many Y’s, and the Y’s for any particular X are normally distributed.
In other words, the residuals must be normally distributed. (Perfect normality isn’t required. As you’ll see later, the test statistic is Student’s t, which is robust against moderate departures from normality.)
You can check that requirement on your TI calculator by following these steps:
LRESID
. (To get
LRESID
, press [2nd
STAT
makes LIST
], scroll up to
RESID
, and press [ENTER
].)In addition to the requirement of normality, the plot of residuals versus x should be boring: no bends, no thickening or thinning from left to right, and no outliers.
Of course we don’t know all the points in the population, since our sample measured some of them and not all. As usual, our sample statistics are point estimates for the population parameters, which we don’t know.
So the standard deviation of the residuals, for all points in the population, is estimated by the standard deviation of the residuals, for just the points in our sample:
(3)
You’re used to the formula for standard deviation of a sample having n−1 in the denominator, so why is it n−2 here? In the standard deviation of a list of numbers, there are n−1 degrees of freedom because if you know n−1 points the last one is forced to make the mean come out right. Here, there is a different mean Y for each value of X, so you have n−2 degrees of freedom.
For computational methods, please see the Example below.
The standard deviation of residuals s_{e} is used to compute confidence intervals for mean response as well as prediction intervals for individual responses. It is also a component of the formula for the standard error of the slope of the regression line, which is written s_{b1}:
(4)
There’s no need for a √n term in the denominator — that’s already covered by equation 3’s computation of s_{e}, which forms part of the computation of s_{b1}.
The standard error of the slope of the regression line is used to compute a confidence interval or perform a hypothesis test about the slope of the regression line. Notice that equation 4 computes s_{b1} not σ_{b1} — this signifies that we have an estimate of the standard error, since the population standard deviation is unknown.
At long last, after all those preliminaries, we can identify the relevant distribution. The regression on any sample of size n will give you a slope b_{1} that depends on the sample. Slopes of samples form a Student’s t distribution about the slope β_{1} of a population regression:
(5)
where s_{b1} is calculated from equation 4 and equation 3.
For all the procedures on this page, we’ll use the following sample of commuting distances and times for fifteen randomly selected coworkers, adapted from Johnson & Kuby 2004 [full citation at https://BrownMath.com/swt/sources.htm#so_Johnson2004], page 623.
Commuting Distances and Times  

Person  1  2  3  4  5  6  7  8  9  10  11  12  13  14  15 
Miles, x  3  5  7  8  10  11  12  12  13  15  15  16  18  19  20 
Minutes, y  7  20  20  15  25  17  20  35  26  25  35  32  44  37  45 
All the procedures on this page require you to perform the regression first, to obtain b_{0} and b_{1} for your sample. (They’re shown below as b_{0} = 3.6 and b_{1} = 1.9.)
Always think about the realworld meanings of these quantities. In this case, the slope b_{1} is the marginal cost of living further away: it tells you that adding a mile to the commute distance adds, on average, 1.9 minutes to the time. Mathematically, the intercept b_{0} is the commuting time for a distance of zero. Put that way it’s not meaningful, but you can also think of it as the fixed cost of commuting: 3.6 minutes of the average commute time doesn’t depend on distance. That covers starting the car, brushing off the snow, getting out of the driveway, finding a parking space in the company lot, and so forth.
The TI83/84 or TI89 saves you a lot of work, so this page will use it. If you don’t have one of those calculators, you can do the computations by hand, looking things up in tables where necessary, then check your results against the ones shown here.
TI Calculator Procedure  Hand Calculations  

Put the x’s in list L3 and the
y’s in list L4. Then [STAT ] [◄ ] [▲ ]
and scroll to LinRegTTest. Specify L3 and L4, with frequency of 1.
The next line doesn’t matter, and you can either put Y1 in
ReqEQ [VARS ] [► ] [1 ] [1 ] or leave it blank.
Here are the setup screen and the results:
The calculator uses a+bx as the regression equation, so the intercept b_{0} shows as a=3.6434 and the slope b_{1} shows as b=1.8932. s=5.4011 on the results screen is s_{e}, the standard deviation of residuals, which saves a lot of computation. 
Begin by computing the sums that are needed in the formulas. You
should get these results:
n = 15, ∑x = 184, ∑x² = 2616 ∑y = 403, ∑y² = 12493, ∑xy = 5623 Now perform the regression. (Use Excel, or find formulas in Least Squares — the Gory Details.) You should get these results: intercept b_{0} = 3.643387816 slope b_{1} = 1.89320208 As a matter of interest, here’s a plot of the points and the regression line. The dots occur every two miles horizontally and every five minutes vertically.
Use equation 3 to compute the standard deviation of residuals in the sample:
You should get a result of s_{e} = 5.401135301.  
Now use equation 4 (at right) to find s_{b1}, the standard
error of the slope. The LinRegTTest operation stored all sorts of
useful variables in the submenus of [VARS ] [5:Statistics ]:
n is the first variable under XY , summations are under
∑ , b_{0} is a and b_{1} is b under EQ ,
and s_{e} is s, down at position 0 under TEST .
As shown here, compute SS(x) = 358.9333333 first, then divide s_{e} by the square root of that to get s_{b1} = 0.2850875. 
Now use equation 4 to find s_{b1}, the standard
error of the slope:
Compute SS(x) = 358.9333333. Divide s_{e} = 5.401135301 by √SS(x) to get s_{b1} = 0.2850875. (The “Sample Statistics” block of the accompanying Excel workbook will do these calculations for you.) 
Since we’ll need some values for our later computations, let’s make them easy to refer to:
b_{0} = 3.6434 b_{1} = 1.89320 SS(x) = 358.9333333 s_{e} = 5.4011 s_{b1} = 0.2850875
Before you can make any inference (hypothesis test or confidence interval) about correlation or regression in the population, check these requirements:
The problem tells you that the sample was random, but you have to do some work to verify the other two requirements.
TI Calculator Procedure  Hand Calculations  

A nice byproduct of the LinRegTTest is that it computes the
residuals for you and stores them in a list called RESID .
You can easily use [2nd Y= makes STAT PLOT ] to plot them against x.
At the prompt for the Y list, press [2nd STAT makes LIST ]
[▲ ], scroll to RESID if necessary, and
press [ENTER ]. Here’s the plot of the residuals against
x. As you can see, there are no bends, no thickening or thinning
trend, and no outliers.

Begin by computing all the residuals. For each (x_{j}, y_{j}) data
pair, the residual is e_{j} = y_{j}−ŷ_{j} =
y_{j}−b_{0}−b_{1}x_{j}.
Once you’ve computed the residuals, making a scatter plot of the residuals versus x is tedious but not especially difficult. You should get something like the plot shown at left.  
Next, check that the residuals are normally distributed.
An easy way is to use
MATH200A part 4. When prompted for the data
list, press [2nd STAT makes LIST ] [▲ ], scroll to
RESID if necessary, and press [ENTER ] twice. If you
don’t have the program, you can get the same effect by choosing
the sixth graph type on the STAT PLOT screen and selecting the
RESID list as your data list.
Caution: the correlation coefficient r=.9772 in the picture
is for the normal probability plot, not the sample data. The
correlation coefficient of the commuting distances and times is
0.8788, as shown in the TI output from the  To test that the residuals are normal, you can use an Excel workbook; see Normality Check and Finding Outliers in Excel. Some models of TI calculators can make normal probability plots; for instructions see Normality Check on TI83/84 or Normality Check on TI89. If all else fails, you can make the plot by hand; see the “Theory” appendix to any of those articles. 
Now we’re ready to compute a confidence interval. The slope b_{1} for our sample is a point estimate for the true regression slope β_{1} of the population, so we can estimate β_{1} for any desired confidence level.
On the TI89 and TI84, you can use the
LinRegTInt
command on the STAT TESTS
menu. This
gives (1.2773, 2.5091) for the 95% confidence interval, and
you can jump right to the Conclusion
below.
(The “Confidence Interval for Slope” block of the accompanying Excel workbook will do these calculations for you, and so will MATH200B part 7.)
With the TI83, you have to do the computations by hand. From equation 5 we see that the sampling distribution of b_{1} follows a Student’s t distribution with mean β_{1}, standard deviation s_{b1}, and degrees of freedom n−2. That’s all we need to see that the (1−α)% confidence interval for the slope of the regression line is
(6) b_{1} − E ≤ β_{1} ≤ b_{1} + E where E = t_{n2,α/2} · s_{b1}
Example: Compute the 95% confidence interval for the slope of the line in our example.
Solution: We already have b_{1} = 1.89320 and s_{b1} = 0.2850875. To compute t_{n2,α/2}, start with n = 15 so n−2 = 13. 1−α = 0.95, so α = 0.05 and α/2 = 0.025. That means that
E = t_{13,0.025} · 0.2850875
There are many methods to find
t_{13,0.025} — including the invT
function on the TI84, and good old reference tables in a book.
The value is 2.160368652, and therefore
E = 2.160368652 · 0.2850875 = 0.6158940982
1.89320 − 0.6158940982 ≤ β_{1} ≤ 1.89320 + 0.6158940982
1.28 ≤ β_{1} ≤ 2.51
Conclusion: We’re 95% confident that the slope is between 1.28 and 2.51 minutes per mile, or that each extra mile of commute costs 1.28 to 2.51 extra minutes.
You can test whether the slope is positive (H_{1}:β_{1}>0), whether it’s negative (H_{1}:β_{1}<0), or whether it’s nonzero (H_{1}:β_{1}≠0). Since the population correlation coefficient ρ has the same sign as the slope β_{1}, this is equivalent to testing whether there is a positive linear relation, a negative linear relation, or just any linear relation between X and Y.
In all three cases, the null is that the slope is zero (H_{0}:β_{1}=0) or that there’s no linear association in the population (H_{0}:ρ=0).
Example: Test the hypothesis that commute time is associated with commute distance. Use α = 0.05.
Comment: This is a twotailed test to determine if the slope is nonzero. You might think it’s pretty obvious that the slope can’t be negative. That would mean that the further you have to drive, the less time it tends to take. But maybe people who live further away take freeways, while people who live closer must take congested local streets. While it’s not likely, it’s at least possible.
With a TI83/84 or a TI89, use LinRegTTest
. The
test statistic is t_{0} = 6.64, and the twotailed pvalue
is 0.000016. You reject H_{0} and accept H_{1}: the slope is
nonzero. Please see
Conclusion below for the
interpretation.
(The “Hypothesis Test for Slope” block of the accompanying Excel workbook will do these calculations for you.)
If you’re working by hand, use equation 5, reproduced at right. β_{1}=0 according to the null hypothesis, so your test statistic is
t_{0} = (b_{1} − 0) / s_{b1}
t_{0} = 1.89320 / 0.2850875 = 6.64
Now use a table, for n−2 = 13 degrees of freedom, to find the area of the righthand tail as 8.054×10^{6}. This is a twotailed test, so the pvalue is twice that, 1.6×10^{5} or about 0.000016. This is well under α, so you reject H_{0} and accept H_{1}. The slope is nonzero.
Conclusion: At the 0.05 level of significance, there is a linear association between commuting distance in miles and commuting time in minutes, and the slope is positive: a longer distance does tend to take more time.
See also: p < α in TwoTailed Test: What Does It Tell You? for onetailed interpretation of a twotailed test.
The y intercept β_{0} is simply the number you get when you set x = 0 in the regression equation. Please see below for the confidence interval procedure for mean response to any x.
MATH200B part 7 will compute a confidence interval for the y intercept.
The purpose of a regression line is to predict the response of the dependent variable Y to the independent variable X. The regression line derived from the sample lets us plug in x_{j} and compute
ŷ_{j} = b_{0} + b_{1} x_{j}
In the population, for any given x_{j} there can be many y_{j}’s. ŷ_{j} above is a point estimate for the mean value of y_{j} in the population for the given value of x_{j}. But if we had a different sample, we’d have a different ŷ_{j}. So we’d like to know the mean value of Y in the population for a particular value of X. For a particular x_{j}, we denote the mean value of Y as μ_{yx=xj}.
Like any other population mean value, this has a confidence interval formed by taking a margin of error around the point estimate ŷ_{j}:
(7)
where s_{e} is the standard deviation of the residuals from equation 3.
(The “Confidence Interval for Mean Response” block of the accompanying Excel workbook will do these calculations for you, and so will MATH200B part 7.)
Example: Construct the 95% confidence interval for the mean commute time for a 10mile oneway trip.
Solution: Compute the point estimate ŷ_{j} and the margin of error E, and combine them to make the confidence interval about μ_{yx=xj}.
To compute ŷ_{j}, see Finding ŷ from a Regression on TI83/84. Here x_{j} = 10 and ŷ_{j} = 3.6434 + 1.89320·10 = 22.5754 minutes. This is the point estimate of the average time for a 10mile commute, based on this sample alone.
To compute E, we need some figures that we computed above:
t_{13,0.025} = 2.160368652
s_{e} = 5.401135301
SS(x) = 358.9333333
The last thing we need for equation 7 is x̅, the average value of X in our sample. That’s just
x̅ = (∑x)/n = 184/15
Now put together the margin of error:
E = 2.160368652 · 5.401135301 · √[1/15 + (10−184/15)²/358.9333333]
E = 3.320501208
and using ŷ_{j} = 22.5754 from a couple of paragraphs ago,
22.5754 − 3.3205 ≤ μ_{yx=xj} ≤ 22.5754 + 3.3205
19.25 ≤ μ_{yx=xj} ≤ 25.90
Conclusion: We’re 95% confident that the average time for all 10mile commutes is between 19.25 and 25.90 minutes.
Maybe you’re not so concerned about the mean commute time for 10 miles. Maybe you want to predict a 95% interval for your time. In other words, you don’t want the population parameter μ_{yx=xj} that we calculated above, you want the likely range for what your own commute time might be.
We call this a prediction interval for individual responses, to distinguish from the confidence interval for mean response. As usual, there is more variation in individual responses than there is in means, so the prediction interval is wider than the confidence interval. But the formula is strikingly similar to the equation 7 formula for the confidence interval of the mean:
(8)
The only difference is that the standard error is larger by the addition of 1 inside the radical sign.
(The “Confidence Interval for Individual Response” block of the accompanying Excel workbook will do these calculations for you, and so will MATH200B part 7.)
Example: Let’s compute the 95% prediction interval for y(10), the Y values at x = 10. We already have all the numbers we need from the previous example:
ŷ_{j} = 22.5754
t_{13,0.025} = 2.160368652
s_{e} = 5.401135301
SS(x) = 358.9333333
x̅ = (∑x)/n = 184/15
Now put together the margin of error:
E = 2.160368652 · 5.401135301 · √[1 + 1/15 + (10−184/15)²/358.9333333]
E = 12.13170637
and the prediction interval:
22.5754 − 12.1317 ≤ y_{j} ≤ 22.5754 + 12.1317
10.44 ≤ y_{j} ≤ 34.71
Conclusion: You are 95% likely to have a commute time between 10.44 and 34.71 minutes for a 10mile commute.
The prediction interval for an individual trip of a particular distance is much wider than the confidence interval for the mean of all trips of that distance. This falls right in line with what we already know, that there is a lot less variability in means than in individual data points.
Updates and new info: https://BrownMath.com/stat/