→ Stats w/o Tears → Ch 4 Solutions
Stats w/o Tears home page

Stats without Tears
Solutions for Chapter 4

Updated 11 Oct 2014 (What’s New?)
Copyright © 2013–2017 by Stan Brown

View or
These pages change automatically for your screen or printer. Underlined text, printed URLs, and the table of contents become live links on screen; and you can use your browser’s commands to change the size of the text or search for key words. If you print, I suggest black-and-white, two-sided printing.
Because this textbook helps you,
please click to donate!
Because this textbook helps you,
please donate at

← Exercises for Ch 4 

1 64% of the variation in salary is associated with variation in age.

Common mistake: Don’t use any form of the word “correlation” in your answer. Your friend wouldn’t understand it, but it’s wrong anyway. Correlation is the interpretation of r, not R². Yes, r is related to R², but R² as such is not about correlation.

Common mistake: R² tells you how much of the variation in y is associated with variation in x, not the other way around. It’s not accurate to say 64% of variation in age is associated with variation in salary.

Common mistake: Don’t say “explained by” to non-technical people. The regression shows an association, but it does not show that growing older causes salary increases.

2 (a) We know that power boats kill manatees, so the boat registrations must be the explanatory variable (x) and the manatee power-boat kills must be the response variable (y). (Although this is an observational study, the cause of death is recorded, so we do know that the boats cause these manatee deaths.)

(b) scatterplot  Yes

(c) regression results The results of LinReg(ax+b) L1,L2,Y1 are shown at right. The correlation coefficient is r = 0.91

(d) ŷ = 0.1127x − 35.1786
Note: ŷ, not y. Note: −35.1786, not +−35.1786.

(e) The slope is 0.1127. An increase of 1000 power-boat registrations is associated with an increase of about 0.11 manatee deaths, on average.
It’s every 1000 boats, not every boat, because the original table is in thousands. Always be specific: “increase”, not just “change”.

Remark: Although this is mathematically accurate, people may not respond well to 0.11 as a number of deaths, which obviously is a discrete variable. You might multiply by 100 and say that 100,000 extra registrations are associated with 11 more manatee deaths on average; or multiply by 10 and round a bit to say that 10,000 extra registrations are associated with about one more manatee death on average.

(f) The y intercept is −35.1786. Mathematically, if there were no power boats there would be about minus 35 manatees killed by power boats. But this is not applicable because x=0 (no boats) is far outside the range of x in the data set.

(g) R² = 0.83. About 83% of variation in manatee deaths from power boats is associated with the variation in registrations of power boats.
It’s R², not r². And don’t use any form of the word “correlate” in your answer.

100% of manatee power-boat deaths come from power boats, so why isn’t the association 100%? The other 17% is lurking variables plus natural variability. For instance, maybe the weather was different in some years, so owners were more or less likely to use their boats. Maybe a campaign of awareness in some years caused some owners to lower their speeds in known manatee areas.

(h) TRACE x=559, y-hat=27.8  ŷ = 27.8

(i) y−ŷ = 34−27.8 = 6.2

(j) Remember that x is in thousands, so a million boats is x = 1000. But x=1000 is far outside the data range, so the regression can’t be used to make a prediction.

3 The decision point for n=10 is 0.632, and |r| = 0.57. |r| < d.p., and therefore you can’t reach a conclusion. From the sample data, it’s impossible to say whether there is any association between TV watching and GPA for TC3 students in general.
Note: Always state the decision point and show the comparison to r.
4 (a) scatterplot for deep freeze  Yes
The point (0,6) is hard to see behind the y axis, but it’s there.

(b) regression results for deep freezer The results of LinReg(ax+b) L3,L4,Y2 are shown at right. ŷ = −3.5175x+6.4561

(c) The slope is −3.5175. Increasing the dial setting by one unit decreases temperature by about 3.5°.
Again, state whether y increases or decreases with increasing x.

(d) The y intercept is 6.4561. A dial setting of 0 corresponds to about 6.5°.

(e) r = −0.99

(f) R² = 0.98. About 98% of variation in temperature is associated with variation in dial setting.

This seems almost too good to be true, as though the data were just made up. ☺ But it’s hard to think of many lurking variables. Maybe it happened that some measurements were taken just after the compressor shut off, and others were taken just before the compressor was ready to switch on again in response to a temperature rise.

(g) TRACE: x=1, y-hat=2.9  ŷ = 2.9°

5 For n = 12, the decision point is 0.576. |r| = 0.85 is greater than that, so there is an association. Increased study time is associated with increased exam score for statistics students in general.
6 No. There’s a lurking variable here: age. Older pupils tend to have larger feet and also tend to have increased reading ability.

r, the linear correlation coefficient, would be roughly zero. Taking the plot as a whole, as x increases, y is about equally likely to increase or decrease. A straight line would be a terrible model for the data.

Clearly there is a strong correlation, but it is not a linear correlation. Probably a good model for this data set would be a quadratic regression, ŷ = ax²+bx+c. Though we study only linear regressions, your calculator can perform quadratic and many other types.

8 The coefficient of determination, R², answers this question. For linear correlations, R² is indeed the square of the correlation coefficient r. r = 0.30 ⇒ R² = 0.09. Therefore 9% of the variation in IQ is associated with variation in income.

Remark: Don’t say “caused by” variation in family income. Correlation is not causation. You can think of some reasons why it might be plausible that wealthier families are more likely to produce smarter children, or at least children who do better on standardized tests, but you can’t be sure without a controlled experiment.

Remark: Though it’s an interesting fact, the correlation in twins’ IQ scores is not needed for this problem. In real life, an important part of solving problems and making decisions is focusing on just the relevant information and not getting distracted.

What’s New

Because this textbook helps you,
please click to donate!
Because this textbook helps you,
please donate at

Updates and new info:

Site Map | Home Page | Contact