510 CHAPTER 10 Correlation and Regression DEFINITION The linear correlation coefficient r measures the strength of the linear correlation between the paired quantitative x values and y values in a sample. The linear correlation coefficient r is computed by using Formula 10-1 or Formula 10-2, included in the following Key Elements box. [The linear correlation coefficient is sometimes referred to as the Pearson product moment correlation coefficient in honor of Karl Pearson (1857–1936), who originally developed it.] Because the linear correlation coefficient r is calculated using sample data, it is a sample statistic used to measure the strength of the linear correlation between x and y. If we had every pair of x and y values from an entire population, the result of Formula 10-1 or Formula 10-2 would be a population parameter, represented by r (Greek letter rho). Calculating and Interpreting the Linear Correlation Coefficient r Objective Determine whether there is a linear correlation between two variables. Notation for the Linear Correlation Coefficient n number of pairs of sample data. Σ denotes addition of the items indicated. Σx sum of all x values. Σx2 indicates that each x value should be squared and then those squares added. 1Σx22 indicates that the x values should be added and the total then squared. Avoid confusing Σx2 and 1Σx22. Σxy indicates that each x value should first be multiplied by its corresponding y value. After obtaining all such products, find their sum. r linear correlation coefficient for sample data. r linear correlation coefficient for a population of paired data. Requirements KEY ELEMENTS Given any collection of sample paired quantitative data, the linear correlation coefficient r can always be computed, but the following requirements should be satisfied when using the sample paired data to make a conclusion about linear correlation in the corresponding population of paired data. 1. The sample of paired 1x, y2 data is a simple random sample of quantitative data. (It is important that the sample data have not been collected using some inappropriate method, such as using a voluntary response sample.) 2. Visual examination of the scatterplot must confirm that the points approximate a straight-line pattern.* 3. Because results can be strongly affected by the presence of outliers, any outliers must be removed if they are known to be errors. The effects of any other outliers should be considered by calculating r with and without the outliers included.* *Note: Requirements 2 and 3 above are simplified attempts at checking this formal requirement: The pairs of 1x, y2 data must have a bivariate normal distribution. Normal distributions are discussed in Chapter 6, but this assumption basically requires that for any fixed value of x, the corresponding values of y have a distribution that is approximately normal, and for any fixed value of y, the values of x have a distribution that is approximately normal. This requirement is usually difficult to check, so for now, we will use Requirements 2 and 3 as listed above. Alternatives If the first requirement is violated and the data have been collected using an inappropriate method, it is likely that nothing can be done to conduct a reasonable analysis of correlation. If other requirements are violated, possible alternatives include using rank correlation (Section 13-6) or a randomization test, which is discussed later in Part 3 of this section. Go Figure 15 Billion: The number of years it would take the atomic clock in Boulder, Colorado, to be off by one second.

RkJQdWJsaXNoZXIy NjM5ODQ=