10-1 Correlation 517 Spurious correlations will become more common with the increased use of big data, and they are more likely to occur with time-series data that have similar trends. Interpreting r: Explained Variation If we conclude that there is a linear correlation between x and y, we can find a linear equation that expresses y in terms of x, and that equation can be used to predict values of y for given values of x. In Section 10-2 we will describe a procedure for finding such equations and show how to predict values of y when given values of x. But a predicted value of y will not necessarily be the exact result that occurs because in addition to x, there are other factors affecting y, such as random variation and other characteristics not included in the study. In Section 10-3 we will present a rationale and more details about this principle: The value of r2 is the proportion of the variation in y that is explained by the linear relationship between x and y. DEFINITION A spurious correlation is a correlation that doesn’t have an actual association, as in Example 5. Using the 9 pairs of data from Table 10-1 included with the Chapter Problem, we get a linear correlation coefficient of r = 0.947. What proportion of the variation in the numbers of tickets sold can be explained by the variation in jackpot amounts? CP EXAMPLE 6 Explained Variation SOLUTION With r = 0.947, we get r2 = 0.897. INTERPRETATION We conclude that 0.897 (or about 90%) of the variation in the numbers of tickets sold can be explained by the linear relationship between jackpot amounts and numbers of tickets sold. This implies that about 10% of the variation in the numbers of tickets sold cannot be explained by the linear correlation between the two variables. Interpreting r with Causation: Don’t Go There! In Example 4 we concluded that there is sufficient evidence to support the claim of a linear correlation between lottery jackpot amounts and numbers of tickets sold. We should not make any conclusion that includes a statement about a cause-effect relationship between the two variables. We should not conclude that an increase in the jackpot amount will cause ticket sales to increase. See the first of the following common errors, and know this: Correlation does not imply causality!
RkJQdWJsaXNoZXIy NjM5ODQ=