← Exercises for Ch 4
Common mistake: Don’t use any form of the word “correlation” in your answer. Your friend wouldn’t understand it, but it’s wrong anyway. Correlation is the interpretation of r, not R˛. Yes, r is related to R˛, but R˛ as such is not about correlation.
Common mistake: R˛ tells you how much of the variation in y is associated with variation in x, not the other way around. It’s not accurate to say 64% of variation in age is associated with variation in salary.
Common mistake: Don’t say “explained by” to non-technical people. The regression shows an association, but it does not show that growing older causes salary increases.
The results of
LinReg(ax+b) L1,L2,Y1 are shown at
right. The correlation coefficient is
r = 0.91
ŷ = 0.1127x − 35.1786
Note: ŷ, not y. Note: −35.1786, not +−35.1786.
The slope is 0.1127. An increase of 1000 power-boat registrations is associated with an increase of about 0.11 manatee deaths, on average.
It’s every 1000 boats, not every boat, because the original table is in thousands. Always be specific: “increase”, not just “change”.
Remark: Although this is mathematically accurate, people may not respond well to 0.11 as a number of deaths, which obviously is a discrete variable. You might multiply by 100 and say that 100,000 extra registrations are associated with 11 more manatee deaths on average; or multiply by 10 and round a bit to say that 10,000 extra registrations are associated with about one more manatee death on average.
(f) The y intercept is −35.1786. Mathematically, if there were no power boats there would be about minus 35 manatees killed by power boats. But this is not applicable because x=0 (no boats) is far outside the range of x in the data set.
R˛ = 0.83. About 83% of variation in manatee deaths from power boats is associated with the variation in registrations of power boats.
It’s R˛, not r˛. And don’t use any form of the word “correlate” in your answer.
100% of manatee power-boat deaths come from power boats, so why isn’t the association 100%? The other 17% is lurking variables plus natural variability. For instance, maybe the weather was different in some years, so owners were more or less likely to use their boats. Maybe a campaign of awareness in some years caused some owners to lower their speeds in known manatee areas.
(h) ŷ = 27.8
(i) y−ŷ = 34−27.8 = 6.2
(j) Remember that x is in thousands, so a million boats is x = 1000. But x=1000 is far outside the data range, so the regression can’t be used to make a prediction.
The results of
LinReg(ax+b) L3,L4,Y2 are shown at right.
ŷ = −3.5175x+6.4561
The slope is −3.5175. Increasing the dial setting by one unit decreases temperature by about 3.5°.
Again, state whether y increases or decreases with increasing x.
(d) The y intercept is 6.4561. A dial setting of 0 corresponds to about 6.5°.
(e) r = −0.99
(f) R˛ = 0.98. About 98% of variation in temperature is associated with variation in dial setting.
This seems almost too good to be true, as though the data were just made up. ☺ But it’s hard to think of many lurking variables. Maybe it happened that some measurements were taken just after the compressor shut off, and others were taken just before the compressor was ready to switch on again in response to a temperature rise.
(g) ŷ = 2.9°
r, the linear correlation coefficient, would be roughly zero. Taking the plot as a whole, as x increases, y is about equally likely to increase or decrease. A straight line would be a terrible model for the data.
Clearly there is a strong correlation, but it is not a linear correlation. Probably a good model for this data set would be a quadratic regression, ŷ = ax˛+bx+c. Though we study only linear regressions, your calculator can perform quadratic and many other types.
Remark: Don’t say “caused by” variation in family income. Correlation is not causation. You can think of some reasons why it might be plausible that wealthier families are more likely to produce smarter children, or at least children who do better on standardized tests, but you can’t be sure without a controlled experiment.
Remark: Though it’s an interesting fact, the correlation in twins’ IQ scores is not needed for this problem. In real life, an important part of solving problems and making decisions is focusing on just the relevant information and not getting distracted.