536 CHAPTER 10 Correlation and Regression Residuals and the Least-Squares Property We stated that the regression equation represents the straight line that “best” fits the data. The criterion to determine the line that is better than all others is based on the vertical distances between the original data points and the regression line. Such distances are called residuals. DEFINITION For a pair of sample x and y values, the residual is the difference between the observed sample value of y and the y value that is predicted by using the regression equation. That is, Residual = observed y - predicted y = y - yn So far, this definition hasn’t yet won any prizes for simplicity, but you can easily understand residuals by referring to Figure 10-6, which corresponds to the paired sample data shown in the margin. In Figure 10-6, the residuals are represented by the dashed lines. The paired data are plotted as red points in Figure 10-6. 0 0 10 20 30 40 10 20 30 40 Residual = –5 Residual = 11 Residual = –13 Residual = 7 y x y = 1 + x ˆ FIGURE 10-6 Residuals and Squares of Residuals x 8 12 20 24 y 424 832 Consider the sample point with coordinates of 18, 42 plotted in Figure 10-6. We get the following: ■ Observed value: For x = 8, the corresponding observed value is y = 4. ■ Predicted value: If we substitute x = 8 into the regression equation of yn = 1 + x, we get the predicted value yn = 9. ■ Residual: The difference between the observed value and predicted value is the residual, so the residual is y - yn = 4 - 9 = -5. The regression equation represents the line that “best” fits the points according to the following least-squares property.

RkJQdWJsaXNoZXIy NjM5ODQ=