518 CHAPTER 10 Correlation and Regression Common Errors Involving Correlation Here are three of the most common errors made in interpreting results involving correlation: 1. Assuming that correlation implies causality. One classic example involves paired data consisting of the stork population in Oldenburg, Germany and the number of human births. For the years of 1930 to 1936, the data suggested a linear correlation. Bulletin: Storks do not actually cause births, and births do not cause storks. Both variables were affected by another variable lurking in the background. (A lurking variable is one that affects the variables being studied but is not included in the study.) Here, an increasing human population resulted in more births and increased construction of thatched roofs that attracted storks! 2. Using data based on averages. Averages suppress individual variation and may inflate the correlation coefficient. One study produced a 0.4 linear correlation coefficient for paired data relating income and education among individuals, but the linear correlation coefficient became 0.7 when regional averages were used. 3. Ignoring the possibility of a nonlinear relationship. If there is no linear correlation, there might be some other correlation that is not linear, as in Figure 10-2(d). PART 2 Formal Hypothesis Test Hypotheses If conducting a formal hypothesis test to determine whether there is a significant linear correlation between two variables, use the following null and alternative hypotheses that use r to represent the linear correlation coefficient of the population: Null Hypothesis H0 : r = 0 1No correlation2 Alternative Hypothesis H1 : r ≠ 0 1Correlation2 Test Statistic The same methods of Part 1 can be used with the test statistic r, or the t test statistic can be found using the following: Test Statistic t = r B1 - r2 n - 2 1with n - 2 degrees of freedom2 If the above t test statistic is used, P-values and critical values can be found using technology or Table A-3 as described in earlier chapters. See the following example. Use the paired data from Table 10-1 on page 507 to conduct a formal hypothesis test of the claim that there is a linear correlation between lottery jackpot amounts and numbers of tickets sold. Use a 0.05 significance level with the P-value method of testing hypotheses. CP EXAMPLE 7 Hypothesis Test Using the P-Value from the t Test SOLUTION REQUIREMENT CHECK In Example 4 we noted that the requirements appear to be satisfied. To claim that there is a linear correlation is to claim that the population linear correlation coefficient r is different from 0. We therefore have the following hypotheses: H0 : r = 0 1There is no linear correlation.2 H1 : r ≠ 0 1There is a linear correlation.2
RkJQdWJsaXNoZXIy NjM5ODQ=