616 CHAPTER 12 Analysis of Variance Test Statistic for One-Way ANOVA: F = variance between samples variance within samples The numerator of the F test statistic measures variation between sample means. The estimate of variance in the denominator depends only on the sample variances and is not affected by differences among the sample means. Consequently, sample means that are close in value to each other result in a small F test statistic and a large P-value, so we conclude that there are no significant differences among the sample means. Sample means that are very far apart in value result in a large F test statistic and a small P-value, so we reject the claim of equal means. Why Not Just Test Two Samples at a Time? If we want to test for equality among three or more population means, why do we need a new procedure when we can test for equality of two means using the methods presented in Section 9-2? For example, if we want to use the sample data from Table 12-1 to test the claim that the four populations have the same mean, why not simply pair them off and test two at a time by testing each of the following: H0: m1 = m2 H0: m1 = m3 H0: m1 = m4 H0: m2 = m3 H0: m2 = m4 H0: m3 = m4 For the data in Table 12-1, the approach of testing equality of two means at a time requires six different hypothesis tests. If we use a 0.05 significance level for each of those six hypothesis tests, the actual overall confidence level could be as low as 0.956 (or 0.735). In general, as we increase the number of individual tests of significance, we increase the risk of finding a difference by chance alone (instead of a real difference in the means). The risk of a type I error—finding a difference in one of the pairs when no such difference actually exists—is far too high. The method of analysis of variance helps us avoid that particular pitfall (rejecting a true null hypothesis) by using one test for equality of several means, instead of several tests that each compare two means at a time. Sample means are all close At least one sample mean is very different Fail to reject equality of population means Reject equality of population means Small F test statistic, large P-value Large F test statistic, small P-value F here F here FIGURE 12-2 Relationship Between the F Test Statistic and the P-Value Why 0.05? In 1925, R. A. Fisher published a book that introduced the method of analysis of variance, and he needed a table of critical values based on numerator degrees of freedom and denominator degrees of freedom, as in Table A-5 in Appendix A. Because the table uses two different degrees of freedom, it becomes very long if many different critical values are used, so Fisher included a table using 0.05 only. In a later edition he also included the significance level of 0.01. Stephen Stigler, a notable historian of statistics, wrote in Chance magazine that the choice of a significance level of 0.05 is a convenient round number that is somewhat arbitrary. Although it is arbitrary, the choice of 0.05 accomplishes the following important goals. (1) The value of a 0.05 significance level results in sample sizes that are reasonable and not too large. (2) The choice of 0.05 is large enough to give us a reasonable chance of identifying important effects (by correctly rejecting a null hypothesis of no effect when there really is an effect). (3) The choice of 0.05 is not so small that it forces us to miss important effects (by making the mistake of failing to reject a null hypothesis of no effect when there really is an effect). Table A-5 F Distribution (a = 0. 1 2 1 647.79 799.50 2 38.506 39.00 3 17.443 16.04 4 12.218 10.64 0007 8.4 CAUTION When testing for equality of three or more populations, use analysis of variance. (Using multiple hypothesis tests with two samples at a time could adversely affect the significance level.)
RkJQdWJsaXNoZXIy NjM5ODQ=