10-4 Multiple Regression 555 The preceding Statdisk display shows the adjusted coefficient of determination as “Adjusted R^2” = 0.877 (rounded). If we use Formula 10-8 with R2 = 0.8783478, n = 153, and k = 2, we get adjusted R2 = 0.877 (rounded). When comparing this multiple regression equation to others, it is better to use the adjusted R2 of 0.877. When considering the adjusted R2 of 0.877 by itself, we see that it is fairly high (close to 1), suggesting that the regression equation is a good fit with the sample data. P-Value The P-value is a measure of the overall significance of the multiple regression equation. The displayed P-value of 0 (rounded) is small, indicating that the multiple regression equation has good overall significance and is usable for predictions. We can predict weights of males based on their heights and waist circumferences. Like the adjusted R2, this P-value is a good measure of how well the equation fits the sample data. The P-value results from a test of the null hypothesis that b1 = b2 = 0. Rejection of b1 = b2 = 0 implies that at least one of b1 and b2 is not 0, indicating that this regression equation is effective in predicting weights of males. A complete analysis of results might include other important elements, such as the significance of the individual coefficients, but we are keeping things simple (!) by limiting our discussion to the three key components—multiple regression equation, adjusted R2, and P-value. Finding the Best Multiple Regression Equation When trying to find the best multiple regression equation, we should not necessarily include all of the available predictor variables. Finding the best multiple regression equation requires abundant use of judgment and common sense, and there is no exact and automatic procedure that can be used to find the best multiple regression equation. Determination of the best multiple regression equation is often quite difficult and is beyond the scope of this section, but the following guidelines are helpful. Guidelines for Finding the Best Multiple Regression Equation 1. Use common sense and practical considerations to include or exclude variables. For example, when trying to find a good multiple regression equation for predicting the height of a daughter, we should exclude the height of the physician who delivered the daughter, because that height is obviously irrelevant. 2. Consider the P-value. Select an equation having overall significance, as determined by a low P-value found in the technology results display. 3. Consider equations with high values of adjusted R2, and try to include only a few variables. Instead of including almost every available variable, try to include relatively few predictor 1x2 variables. Use these guidelines: ■ Select an equation having a value of adjusted R2 with this property: If an additional predictor variable is included, the value of adjusted R2 does not increase very much. ■ For a particular number of predictor 1x2 variables, select the equation with the largest value of adjusted R2. ■ In excluding predictor 1x2 variables that don’t have much of an effect on the response 1y2 variable, it might be helpful to find the linear correlation coefficient r for each pair of variables being considered. If two predictor values have a very high linear correlation coefficient (called multicollinearity), there is no need to include them both, and we should exclude the variable with the lower value of adjusted R2.
RkJQdWJsaXNoZXIy NjM5ODQ=