Survey of Mathematics

12.6 Linear Correlation and Regression 823 In this section, we discuss two important statistical topics: correlation and regression. Correlation is used to determine whether there is a relationship between two quantities and, if so, how strong the relationship is. Regression is used to determine the equation that relates the two quantities. Although there are other types of correlation and regression, in this section we discuss only linear correlation and linear regression. We begin by discussing linear correlation. Linear Correlation The linear correlation coefficient, r, is a unitless measure that describes the strength of the linear relationship between two variables. A positive value of r, or a positive correlation, means that as one variable increases, the other variable also increases. A negative value of r, or a negative correlation, means that as one variable increases, the other variable decreases. The correlation coefficient, r, will always be a value between 1− and 1 inclusive. A value of 1 indicates the strongest possible positive correlation, a value of 1− indicates the strongest possible negative correlation, and a value of 0 indicates no linear correlation (Fig. 12.39). –1 0 1 Strongest negative correlation No correlation Strongest positive correlation Figure 12.39 A visual aid used with correlation is the scatter diagram, a plot of data points. To help understand how to construct a scatter diagram, consider the following data from Egan Electronics. During a 6-day period, Egan Electronics kept daily records of the number of assembly line workers absent and the number of defective parts produced. The information is provided in the following chart. Day 1 2 3 4 5 6 Number of workers absent 3 5 0 1 2 6 Number of defective parts 15 22 7 12 20 30 For each of the 6 days, two pieces of data are provided: number of workers absent and number of defective parts. Data, such as these, that involve two variables are called bivariate data. Often when we have a set of bivariate data, we can control one of the quantities. We generally denote the quantity that can be controlled, the independent variable, as x. The other variable, the dependent variable, is denoted as y. In this problem, we will assume that the number of defective parts produced is affected by the number of workers absent. Therefore, we will call the number of workers absent x and the number of defective parts produced y. When we plot bivariate data, the independent variable is marked on the horizontal axis and the dependent variable is marked on the vertical axis. Therefore, for this problem, number of workers absent is marked on the horizontal axis and number of defective parts is marked on the vertical axis. If we plot the six pieces of bivariate data in the Cartesian coordinate system, we get a scatter diagram, as shown in Fig. 12.40.

RkJQdWJsaXNoZXIy NjM5ODQ=