# Stats without Tears

Solutions for Chapter 4

Updated 10 Oct 2014
(What’s New?)

Copyright © 2013–2020 by Stan Brown

Solutions for Chapter 4

Updated 10 Oct 2014
(What’s New?)

Copyright © 2013–2020 by Stan Brown

Print:

Because this textbook helps you,

please click to donate!Because this textbook helps you,

please donate at

BrownMath.com/donate.

please click to donate!Because this textbook helps you,

please donate at

BrownMath.com/donate.

1
64% of the variation in salary is associated with variation in age.

**Common mistake:** Don’t use any form of the word
“correlation” in your answer. Your friend wouldn’t
understand it, but it’s wrong anyway. Correlation is the
interpretation of *r*, not *R*². Yes, *r* is related to
*R*², but *R*² as such is not about correlation.

**Common mistake:** *R*² tells you how much of the
variation in *y* is associated with variation in *x*, not
the other way around. It’s not accurate to say 64% of variation
in age is associated with variation in salary.

**Common mistake:** Don’t say “explained
by” to non-technical people. The regression shows an association,
but it does not show that growing older *causes* salary
increases.

2
(a) We know that power boats kill manatees, so the
boat registrations must be the explanatory variable
(*x*) and the
manatee power-boat kills must be the response variable
(*y*). (Although this is an observational study, the cause of
death is recorded, so we do know that the boats cause these manatee
deaths.)

(b) Yes

(c)
The results of `LinReg(ax+b) L1,L2,Y1`

are shown at
right. The correlation coefficient is
*r* = 0.91

(d)
*ŷ* = 0.1127x − 35.1786

Note: *ŷ*, not y. Note: −35.1786, not
+−35.1786.

(e)
The slope is 0.1127. An increase of 1000 power-boat registrations is associated with an increase of about 0.11 manatee deaths, on average.

It’s every 1000 boats, not every boat, because the original
table is in thousands. Always be specific: “increase”, not
just “change”.

**Remark:**
Although this is mathematically accurate, people may
not respond well to 0.11 as a number of deaths, which obviously is a
discrete variable. You might multiply by 100 and say that 100,000
extra registrations are associated with 11 more manatee deaths on
average; or multiply by 10 and round a bit to say that 10,000 extra
registrations are associated with about one more manatee death on
average.

(f) The y intercept is −35.1786. Mathematically, if there were no power boats there would be about minus 35 manatees killed by power boats. But this is not applicable because x=0 (no boats) is far outside the range of x in the data set.

(g)
*R*² = 0.83. About 83% of variation in manatee deaths from power boats is associated with the variation in registrations of power boats.

It’s *R*², not *r*². And don’t use any form of
the word “correlate” in your answer.

100% of manatee power-boat deaths come from power boats, so why isn’t the association 100%? The other 17% is lurking variables plus natural variability. For instance, maybe the weather was different in some years, so owners were more or less likely to use their boats. Maybe a campaign of awareness in some years caused some owners to lower their speeds in known manatee areas.

(h)
*ŷ* = 27.8

(i)
y−*ŷ* = 34−27.8 = 6.2

(j)
Remember that *x* is in thousands, so a million boats is
*x* = 1000. But
*x*=1000 is far outside the data range, so the regression can’t be used to make a prediction.

3
The decision point for *n*=10 is 0.632, and
|*r*| = 0.57. |*r*| < d.p., and
therefore you can’t reach a conclusion.
From the sample data,
it’s impossible to say whether there is any association between TV watching and GPA for TC3 students in general.

Note: Always state the decision point and show the comparison to*r*.

Note: Always state the decision point and show the comparison to

4
(a)
Yes

The point (0,6) is hard to see behind the*y* axis, but
it’s there.

The point (0,6) is hard to see behind the

(b)
The results of `LinReg(ax+b) L3,L4,Y2`

are shown at right.
*ŷ* = −3.5175x+6.4561

(c)
The slope is −3.5175. Increasing the dial setting by one unit decreases temperature by about 3.5°.

Again, state whether *y* *increases* or
*decreases* with increasing *x*.

(d) The y intercept is 6.4561. A dial setting of 0 corresponds to about 6.5°.

(e)
*r* = −0.99

(f)
*R*² = 0.98. About 98% of variation in temperature is associated with variation in dial setting.

This seems almost too good to be true, as though the data were just made up. ☺ But it’s hard to think of many lurking variables. Maybe it happened that some measurements were taken just after the compressor shut off, and others were taken just before the compressor was ready to switch on again in response to a temperature rise.

(g)
*ŷ* = 2.9°

5
For *n* = 12, the decision point is 0.576.
|*r*| = 0.85 is greater than that, so there is an
association.
Increased study time is associated with increased exam score for statistics students in general.

6
No. There’s a lurking variable here: age.
Older pupils tend to have larger feet and also tend to have increased
reading ability.

7

*r*, the *linear* correlation coefficient, would be
roughly zero. Taking the plot as a whole, as *x* increases, *y* is about
equally likely to increase or decrease. A straight line would be a
terrible model for the data.

Clearly there is a strong correlation, but it is not a linear
correlation. Probably a good model for this data set would be a
quadratic regression,
*ŷ* = *ax*²+*bx*+*c*. Though we
study only linear regressions, your calculator can perform quadratic
and many other types.

8
The coefficient of determination, *R*², answers this
question. For linear correlations, *R*² is indeed the square of the
correlation coefficient *r*.
*r* = 0.30 ⇒ *R*² = 0.09. Therefore
9% of the variation in IQ is associated with variation in income.

**Remark:**
Don’t say “caused by” variation in
family income. Correlation is not causation. You can think of some
reasons why it might be plausible that wealthier families are
more likely to produce smarter children, or at least children who do
better on standardized tests, but you can’t be sure without a
controlled experiment.

**Remark:**
Though it’s an interesting fact, the correlation in
twins’ IQ scores is not needed for this problem. In real life,
an important part of solving problems and making decisions is
focusing on just the relevant information and not getting
distracted.

**11 Oct 2014**: Shrink the screen shots by 25%, and remove excessive white space.**26 Jan 2014**: Add an exercise on interpreting*R*².- (intervening changes suppressed)
**27 Jan 2013**: New document.

Because this textbook helps you,

please click to donate!Because this textbook helps you,

please donate at

BrownMath.com/donate.

please click to donate!Because this textbook helps you,

please donate at

BrownMath.com/donate.

Updates and new info: https://BrownMath.com/swt/