USES AND ABUSES Statistics in the Real World 514 CHAPTER 9 Correlation and Regression EXERCISES 1. Confusing Correlation and Causation Find an example of an article that confuses correlation and causation. Discuss other variables that could contribute to the relationship between the variables. 2. Considering Only Linear Correlation Find an example of two real-life variables that have a nonlinear correlation. Uses Correlation and Regression Correlation and regression analysis can be used to determine whether there is a significant relationship between two variables. When there is, you can use one of the variables to predict the value of the other variable. For instance, educators have used correlation and regression analysis to determine that there is a significant correlation between a student’s SAT score and the grade point average from a student’s freshman year at college. Consequently, many colleges and universities use SAT scores of high school applicants as a predictor of the applicant’s initial success at college. Abuses Confusing Correlation and Causation The most common abuse of correlation in studies is to confuse the concepts of correlation with those of causation (see page 480). Good SAT scores do not cause good college grades. Rather, there are other variables, such as good study habits and motivation, that contribute to both. When a strong correlation is found between two variables, look for other variables that are correlated with both. Considering Only Linear Correlation The correlation studied in this chapter is linear correlation. When the correlation coefficient is close to 1 or close to -1, the data points can be modeled by a straight line. It is possible that a correlation coefficient is close to 0 but there is still a strong correlation of a different type. Consider the data listed in the table at the left. The value of the correlation coefficient is 0. However, the data are perfectly correlated with the equation x2 + y2 = 1, as shown in the figure at the left. Ethics When data are collected, all of the data should be used when calculating statistics. In this chapter, you learned that before finding the equation of a regression line, it is helpful to construct a scatter plot of the data to check for outliers, gaps, and clusters in the data. Researchers must not choose to use only those data points that fit their hypotheses or those that show a significant correlation. Although eliminating outliers may help a data set coincide with predicted patterns or fit a regression line, it is unethical to amend data in such a way. An outlier or any other point that influences a regression model can be removed only when it is properly justified. In most cases, the best and sometimes safest approach for presenting statistical measurements is with and without an outlier being included. By doing this, the decision of whether or not to recognize the outlier is left to the reader. x 1 0 -1 0 y 0 1 0 -1 x 2 −2 2 −2 y x2 + y2 = 1
RkJQdWJsaXNoZXIy NjM5ODQ=